首页 > 最新文献

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science最新文献

英文 中文
Detecting Cerebral Ischemia From Electroencephalography During Carotid Endarterectomy Using Machine Learning. 利用机器学习从脑电图检测颈动脉内膜切除术中的脑缺血现象
Amir I Mina, Jessi U Espino, Allison M Bradley, Parthasarathy D Thirumala, Kayhan Batmanghelich, Shyam Visweswaran

Monitoring cerebral neuronal activity via electroencephalography (EEG) during surgery can detect ischemia, a precursor to stroke. However, current neurophysiologist-based monitoring is prone to error. In this study, we evaluated machine learning (ML) for efficient and accurate ischemia detection. We trained supervised ML models on a dataset of 802 patients with intraoperative ischemia labels and evaluated them on an independent validation dataset of 30 patients with refined labels from five neurophysiologists. Our results show moderate-to-substantial agreement between neurophysiologists, with Cohen's kappa values between 0.59 and 0.74. Neurophysiologist performance ranged from 58-93% for sensitivity and 83-96% for specificity, while ML models demonstrated comparable ranges of 63-89% and 85-96%. Random Forest (RF), LightGBM (LGBM), and XGBoost RF achieved area under the receiver operating characteristic curve (AUROC) values of 0.92-0.93 and area under the precision-recall curve (AUPRC) values of 0.79-0.83. ML has the potential to improve intraoperative monitoring, enhancing patient safety and reducing costs.

在手术过程中通过脑电图(EEG)监测大脑神经元活动可发现缺血,这是中风的前兆。然而,目前基于神经生理学家的监测容易出错。在本研究中,我们评估了机器学习(ML)对缺血检测的效率和准确性。我们在一个包含 802 名术中缺血标签的患者数据集上训练了有监督的 ML 模型,并在一个包含 30 名患者的独立验证数据集上对这些模型进行了评估,该数据集包含来自五位神经电生理学家的精炼标签。我们的结果显示,神经电生理学家之间存在中度到实质性的一致性,科恩卡帕值介于 0.59 和 0.74 之间。神经生理学家的灵敏度为 58-93%,特异度为 83-96%,而 ML 模型的灵敏度为 63-89%,特异度为 85-96%。随机森林 (RF)、LightGBM (LGBM) 和 XGBoost RF 的接收器工作特征曲线下面积 (AUROC) 值为 0.92-0.93,精度-召回曲线下面积 (AUPRC) 值为 0.79-0.83。ML 具有改善术中监测、提高患者安全性和降低成本的潜力。
{"title":"Detecting Cerebral Ischemia From Electroencephalography During Carotid Endarterectomy Using Machine Learning.","authors":"Amir I Mina, Jessi U Espino, Allison M Bradley, Parthasarathy D Thirumala, Kayhan Batmanghelich, Shyam Visweswaran","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Monitoring cerebral neuronal activity via electroencephalography (EEG) during surgery can detect ischemia, a precursor to stroke. However, current neurophysiologist-based monitoring is prone to error. In this study, we evaluated machine learning (ML) for efficient and accurate ischemia detection. We trained supervised ML models on a dataset of 802 patients with intraoperative ischemia labels and evaluated them on an independent validation dataset of 30 patients with refined labels from five neurophysiologists. Our results show moderate-to-substantial agreement between neurophysiologists, with Cohen's kappa values between 0.59 and 0.74. Neurophysiologist performance ranged from 58-93% for sensitivity and 83-96% for specificity, while ML models demonstrated comparable ranges of 63-89% and 85-96%. Random Forest (RF), LightGBM (LGBM), and XGBoost RF achieved area under the receiver operating characteristic curve (AUROC) values of 0.92-0.93 and area under the precision-recall curve (AUPRC) values of 0.79-0.83. ML has the potential to improve intraoperative monitoring, enhancing patient safety and reducing costs.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Large Language Models for Acronym, Symbol Sense Disambiguation, and Semantic Similarity and Relatedness Assessment. 探索用于缩略词、符号意义消歧以及语义相似性和相关性评估的大型语言模型。
Ying Liu, Genevieve B Melton, Rui Zhang

Acronyms, abbreviations, and symbols play a significant role in clinical notes. Acronym and symbol sense disambiguation are crucial natural language processing (NLP) tasks that ensure the clarity and consistency of clinical notes and downstream NLP processing. Previous studies using traditional machine learning methods have been relatively successful in tackling this issue. In our research, we conducted an evaluation of large language models (LLMs), including ChatGPT 3.5 and 4, as well as other open LLMs, and BERT-based models, across three NLP tasks: acronym and symbol sense disambiguation, semantic similarity, and relatedness. Our findings emphasize ChatGPT's remarkable ability to distinguish between senses with minimal or zero-shot training. Additionally, open source LLM Mixtrial-8x7B exhibited high accuracy for acronyms with fewer senses, and moderate accuracy for symbol sense accuracy. BERT-based models outperformed previous machine learning approaches, achieving an impressive accuracy rate of over 95%, showcasing their effectiveness in addressing the challenge of acronym and symbol sense disambiguation. Furthermore, ChatGPT exhibited a strong correlation, surpassing 70%, with human gold standards when evaluating similarity and relatedness.

缩略语、缩写和符号在临床笔记中发挥着重要作用。缩略语和符号意义消歧是自然语言处理(NLP)的关键任务,可确保临床笔记和下游 NLP 处理的清晰度和一致性。以往使用传统机器学习方法解决这一问题的研究相对成功。在我们的研究中,我们对大型语言模型(LLM)进行了评估,包括 ChatGPT 3.5 和 4,以及其他开放式 LLM 和基于 BERT 的模型,涉及三个 NLP 任务:缩略语和符号意义消歧、语义相似性和关联性。我们的研究结果强调了 ChatGPT 在进行最少或零次训练的情况下区分词义的卓越能力。此外,开源 LLM Mixtrial-8x7B 对意义较少的缩略词表现出较高的准确性,对符号意义准确性表现出中等准确性。基于 BERT 的模型表现优于以往的机器学习方法,准确率超过 95%,令人印象深刻,展示了它们在应对首字母缩略词和符号意义消歧挑战方面的有效性。此外,在评估相似性和相关性时,ChatGPT 与人类黄金标准的相关性很强,超过了 70%。
{"title":"Exploring Large Language Models for Acronym, Symbol Sense Disambiguation, and Semantic Similarity and Relatedness Assessment.","authors":"Ying Liu, Genevieve B Melton, Rui Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Acronyms, abbreviations, and symbols play a significant role in clinical notes. Acronym and symbol sense disambiguation are crucial natural language processing (NLP) tasks that ensure the clarity and consistency of clinical notes and downstream NLP processing. Previous studies using traditional machine learning methods have been relatively successful in tackling this issue. In our research, we conducted an evaluation of large language models (LLMs), including ChatGPT 3.5 and 4, as well as other open LLMs, and BERT-based models, across three NLP tasks: acronym and symbol sense disambiguation, semantic similarity, and relatedness. Our findings emphasize ChatGPT's remarkable ability to distinguish between senses with minimal or zero-shot training. Additionally, open source LLM Mixtrial-8x7B exhibited high accuracy for acronyms with fewer senses, and moderate accuracy for symbol sense accuracy. BERT-based models outperformed previous machine learning approaches, achieving an impressive accuracy rate of over 95%, showcasing their effectiveness in addressing the challenge of acronym and symbol sense disambiguation. Furthermore, ChatGPT exhibited a strong correlation, surpassing 70%, with human gold standards when evaluating similarity and relatedness.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FHIRing up OpenMRS: Architecture, Implementation and Real-World Use-Cases in Global Health. FHIR up OpenMRS:全球健康领域的架构、实施和实际应用案例。
I Bacher, M Goodrich, A Kimaina, M Seaton, G Faulkenberry, S Vaish, J Flowers, H S Fraser

HL7 FHIR was created almost a decade ago and is seeing increasingly wide use in high income settings. Although some initial work was carried out in low and middle income (LMIC) settings there has been little impact until recently. The need for reliable and easy to implement interoperability between health information systems in LMICs is growing with large scale deployments of EHRs, national reporting systems and mHealth applications. The OpenMRS open source EHR has been deployed in more than 44 LMIC with increasing needs for interoperability with other HIS. We describe here the development and deployment of a new FHIR module supporting the latest standards and its use in interoperability with laboratory systems, mHealth applications, pharmacy dispensing system and as a tool for supporting advanced user interface designs. We also show how it facilitates date science projects and deployment of machine leaning based CDSS and precision medicine in LMICs.

HL7 FHIR 创建于近十年前,在高收入环境中的应用日益广泛。虽然在中低收入(LMIC)环境中开展了一些初步工作,但直到最近才产生了一点影响。随着电子病历、国家报告系统和移动医疗应用的大规模部署,中低收入国家对卫生信息系统之间可靠且易于实施的互操作性的需求与日俱增。OpenMRS 开放源码电子病历已在超过 44 个低收入与中等收入国家部署,与其他医疗信息系统的互操作性需求日益增加。我们在此介绍新 FHIR 模块的开发和部署情况,该模块支持最新标准,可用于与实验室系统、移动医疗应用、药房配药系统互操作,并可作为支持高级用户界面设计的工具。我们还展示了该模块如何促进约会科学项目以及在低收入国家部署基于机器精益的 CDSS 和精准医疗。
{"title":"FHIRing up OpenMRS: Architecture, Implementation and Real-World Use-Cases in Global Health.","authors":"I Bacher, M Goodrich, A Kimaina, M Seaton, G Faulkenberry, S Vaish, J Flowers, H S Fraser","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>HL7 FHIR was created almost a decade ago and is seeing increasingly wide use in high income settings. Although some initial work was carried out in low and middle income (LMIC) settings there has been little impact until recently. The need for reliable and easy to implement interoperability between health information systems in LMICs is growing with large scale deployments of EHRs, national reporting systems and mHealth applications. The OpenMRS open source EHR has been deployed in more than 44 LMIC with increasing needs for interoperability with other HIS. We describe here the development and deployment of a new FHIR module supporting the latest standards and its use in interoperability with laboratory systems, mHealth applications, pharmacy dispensing system and as a tool for supporting advanced user interface designs. We also show how it facilitates date science projects and deployment of machine leaning based CDSS and precision medicine in LMICs.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate mediation analysis with voxel-based morphometry revealed the neurodegeneration pathways from genetic variants to Alzheimer's Disease. 基于体素形态测量的多变量中介分析揭示了从基因变异到阿尔茨海默病的神经变性途径。
Shizhuo Mu, Jingxuan Bao, Hanxiang Xu, Manu Shivakumar, Shu Yang, Xia Ning, Dokyoon Kim, Christos Davatzikos, Haochang Shou, Li Shen

Neurodegenerative processes are increasingly recognized as potential causative factors in Alzheimer's disease (AD) pathogenesis. While many studies have leveraged mediation analysis models to elucidate the underlying mechanisms linking genetic variants to AD diagnostic outcomes, the majority have predominantly focused on regional brain measure as a mediator, thereby compromising the granularity of the imaging data. In our investigation, using the imaging genetics data from a landmark AD cohort, we contrasted both region-based and voxel-based brain measurements as imaging endophenotypes, and examined their roles in mediating genetic effects on AD outcomes. Our findings underscored that using voxel-based morphometry offers enhanced statistical power. Moreover, we delineated specific mediation pathways between SNP, brain volume, and AD outcomes, shedding light on the intricate relationship among these variables.

神经退行性过程越来越被认为是阿尔茨海默病(AD)发病机制的潜在致病因素。虽然许多研究都利用中介分析模型来阐明遗传变异与阿尔茨海默病诊断结果之间的内在机制,但大多数研究都主要关注作为中介因素的大脑区域测量,从而影响了成像数据的粒度。在我们的研究中,我们利用一个具有里程碑意义的AD队列的影像遗传学数据,对比了基于区域和基于体素的脑部测量结果作为影像内表型,并研究了它们在介导AD结果的遗传效应中的作用。我们的研究结果表明,使用基于体素的形态测量可提高统计能力。此外,我们还划定了SNP、脑容量和AD结果之间的特定中介途径,揭示了这些变量之间错综复杂的关系。
{"title":"Multivariate mediation analysis with voxel-based morphometry revealed the neurodegeneration pathways from genetic variants to Alzheimer's Disease.","authors":"Shizhuo Mu, Jingxuan Bao, Hanxiang Xu, Manu Shivakumar, Shu Yang, Xia Ning, Dokyoon Kim, Christos Davatzikos, Haochang Shou, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Neurodegenerative processes are increasingly recognized as potential causative factors in Alzheimer's disease (AD) pathogenesis. While many studies have leveraged mediation analysis models to elucidate the underlying mechanisms linking genetic variants to AD diagnostic outcomes, the majority have predominantly focused on regional brain measure as a mediator, thereby compromising the granularity of the imaging data. In our investigation, using the imaging genetics data from a landmark AD cohort, we contrasted both region-based and voxel-based brain measurements as imaging endophenotypes, and examined their roles in mediating genetic effects on AD outcomes. Our findings underscored that using voxel-based morphometry offers enhanced statistical power. Moreover, we delineated specific mediation pathways between SNP, brain volume, and AD outcomes, shedding light on the intricate relationship among these variables.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text and Audio Simplification: Human vs. ChatGPT. 文本和音频简化:人类与 ChatGPT。
Gondy Leroy, David Kauchak, Philip Harber, Ankit Pal, Akash Shukla

Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora (using five different prompts). We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated the user study texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert's evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.

简化文本和音频以提高信息理解能力在医疗保健领域非常重要。随着 ChatGPT 的推出,需要对其简化性能进行评估。我们使用 14 个文本难度指标对人类和 ChatGPT 简化文本进行了系统比较。我们简要介绍了我们的在线编辑器,包括 ChatGPT 在内的这些简化工具都可以在这里使用。我们使用我们的指标对 12 个语料库进行了评分:6 个文本、1 个音频和 5 个 ChatGPT 简化语料库(使用 5 种不同的提示)。然后,我们将这些语料与之前用户研究中简化和验证过的文本进行比较。最后,一位医学领域专家对用户研究文本和五个新的 ChatGPT 简化版本进行了评估。我们发现,简单的语料库与人类简化文本的相似度更高。ChatGPT 简化版的度量方向是正确的。医学领域专家的评估结果显示了对 ChatGPT 风格的偏好,但文本本身在内容保留方面的评分较低。
{"title":"Text and Audio Simplification: Human vs. ChatGPT.","authors":"Gondy Leroy, David Kauchak, Philip Harber, Ankit Pal, Akash Shukla","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora (using five different prompts). We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated the user study texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert's evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-phenotype associations between Alzheimer's Disease and its comorbidities may provide clues to progression. 阿尔茨海默病及其并发症之间的交叉表型关联可为病情发展提供线索。
Anni Moore, Marylyn D Ritchie

Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.

阿尔茨海默病(AD)是全球发病率最高的神经退行性疾病,到 2023 年,每九个 65 岁以上的人中就有一人患病。在这项研究中,我们采用表型组广泛关联研究(PheWAS)方法,从英国生物库(UKBB)(n=361,194 名欧洲血统者)和 eMERGE 网络(n=105,108 名不同血统者)中找出先前确定的 AD 遗传关联与电子健康记录(EHR)诊断之间的交叉表型。基于先前从阿尔茨海默病变异门户网站(ADVP)发现的 497 个阿尔茨海默病相关变异,我们在 PheWAS 中发现了主要与免疫和心脏相关疾病有关的显著关联。复制变异对不同组织类型的免疫基因有着广泛的影响。这项研究证明了使用 PheWAS 策略的潜力,它可以提高我们对艾滋病进展的认识,并为新的治疗和疾病预防策略确定潜在的药物再利用机会。
{"title":"Cross-phenotype associations between Alzheimer's Disease and its comorbidities may provide clues to progression.","authors":"Anni Moore, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Note Structural Knowledge Improves Word Sense Disambiguation. 临床笔记结构知识改善了词义消歧。
Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng

Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.

临床笔记中充满了模棱两可的医学缩写。最近基于学习的方法利用上下文知识进行意义消歧。以前的研究结果表明,临床笔记的结构元素包含有用的特征,可为缩写的不同解释提供信息,但这些特征仍未得到充分利用,也未得到充分研究。据我们所知,唯一一项探索笔记结构的研究只是列举了笔记中的标题,而这种表述并不具有语义意义。本文介绍了一种基于学习的方法,该方法使用统一医学语言系统(UMLS)中预定义的语义类型来表示笔记结构。除了广泛使用的 N-gram,我们还在两个不同的数据集上使用三种学习模型对该表示法进行了评估。实验结果表明,我们的特征增强技术持续提高了缩写消歧模型的性能,最佳 F1 得分为 0.93。
{"title":"Clinical Note Structural Knowledge Improves Word Sense Disambiguation.","authors":"Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Semantic Topic Modeling on Twitter Using MetaMap. 利用 MetaMap 在 Twitter 上实现语义主题建模。
Rebecca Shyu, Chunhua Weng

Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning "hydroxychloroquine" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.

主题建模在短语或短句以及不断变化的俚语方面表现不佳,而这些在社交媒体(如 X,前身为 Twitter)中很常见。本研究探讨了 MetaMap 等概念注释工具能否在语义层面上实现主题建模。以提及 "羟氯喹 "的推文为案例,我们提取了在 2020 年 1 月 3 日至 2021 年 1 月 12 日期间发布的 56017 条推文。这些推文通过 MetaMap 以 UMLS Concept Unique Identifiers (CUI) 对概念进行编码,然后我们使用 Latent Dirichlet Allocation (LDA) 为两个数据集确定最佳模型:1)带有原始文本的推文;2)带有替换 CUI 的推文。我们发现,MetaMap LDA模型在一致性和代表性方面优于非MetaMap模型,并能及时识别与社会和政治讨论相关的话题。我们的结论是,通过 UMLS 概念整合 MetaMap 来标准化推文,可以在文本噪声中提高语义主题建模性能。
{"title":"Enabling Semantic Topic Modeling on Twitter Using MetaMap.","authors":"Rebecca Shyu, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning \"hydroxychloroquine\" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare. 增强数据质量保证的众包:缓解医疗保健大型语言模型训练中资源稀缺挑战的有效方法。
Prosanta Barai, Gondy Leroy, Prakash Bisht, Joshua M Rothman, Sumi Lee, Jennifer Andrews, Sydney A Rice, Arif Ahmed

Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.

大型语言模型(LLM)在包括医疗保健在内的各个领域的人工智能中都展现出了巨大的潜力。然而,由于需要高质量的标注数据,这些数据的创建通常既昂贵又耗时,尤其是在医疗保健等资源匮乏的领域,这就阻碍了它们的功效。为了应对这些挑战,我们提出了一个众包(CS)框架,该框架在数据收集前、实时和收集后阶段都加入了质量控制措施。我们的研究通过数据质量对预测自闭症相关症状的 LLMs(Bio-BERT)的影响,评估了提高数据质量的有效性。结果表明,与质量控制前相比,实时质量控制可将数据质量提高 19%。与 Bio-BERT 基线相比,使用众包数据对 Bio-BERT 进行微调普遍提高了召回率,但降低了精确度。我们的研究结果凸显了众包和质量控制在资源受限环境中的潜力,并为优化医疗保健 LLMs 以做出明智决策和改善患者护理提供了启示。
{"title":"Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare.","authors":"Prosanta Barai, Gondy Leroy, Prakash Bisht, Joshua M Rothman, Sumi Lee, Jennifer Andrews, Sydney A Rice, Arif Ahmed","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141838/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Large Language Models for Complex Structured Tasks. 复杂结构任务的本地大型语言模型
V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert

This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.

本文介绍了一种将大型语言模型(LLM)的语言推理能力与本地训练的优势相结合的方法,以解决复杂的语言任务。作者通过从病理报告中提取结构化条件代码来演示他们的方法。所提出的方法利用本地微调 LLM 来响应特定的生成指令,并提供结构化输出。该方法使用了超过 150k 份未经整理的外科病理报告,其中包含大体描述、最终诊断和病情代码。对不同的模型架构进行了训练和评估,包括 LLaMA、BERT 和 LongFormer。结果表明,在所有评估指标上,基于 LLaMA 的模型明显优于 BERT 类型的模型。LLaMA 模型在大型数据集上的表现尤为出色,证明了其处理复杂、多标签任务的能力。总之,这项研究提出了一种有效的方法,可以利用 LLM 对医学领域的特定语言执行结构化生成任务。
{"title":"Local Large Language Models for Complex Structured Tasks.","authors":"V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1