AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science最新文献

英文中文

Counterfactual Sepsis Outcome Prediction Under Dynamic and Time-Varying Treatment Regimes. 动态和时变治疗机制下的反事实败血症结果预测。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Megan Su, Stephanie Hu, Hong Xiong, Elias Baedorf Kassis, Li-Wei H Lehman

Sepsis is a life-threatening condition that occurs when the body's normal response to an infection is out of balance. A key part of managing sepsis involves the administration of intravenous fluids and vasopressors. In this work, we explore the application of G-Net, a deep sequential modeling framework for g-computation, to predict outcomes under counterfactual fluid treatment strategies in a real-world cohort of sepsis patients. Utilizing observational data collected from the intensive care unit (ICU), we evaluate the performance of multiple deep learning implementations of G-Net and compare their predictive performance with linear models in forecasting patient outcomes and trajectories over time under the observational treatment regime. We then demonstrate that G-Net can generate counterfactual prediction of covariate trajectories that align with clinical expectations across various fluid limiting regimes. Our study demonstrates the potential clinical utility of G-Net in predicting counterfactual treatment outcomes, aiding clinicians in informed decision-making for sepsis patients in the ICU.

败血症是一种危及生命的疾病，当人体对感染的正常反应失去平衡时就会发生。处理败血症的一个关键部分是静脉输液和使用血管加压药。在这项工作中，我们探索了 G-Net 的应用，这是一种用于 g 计算的深度序列建模框架，可预测现实世界中败血症患者队列中反事实液体治疗策略下的结果。利用从重症监护室（ICU）收集到的观察数据，我们评估了 G-Net 的多种深度学习实现的性能，并比较了它们与线性模型在预测观察治疗机制下患者的预后和随时间变化的轨迹方面的预测性能。然后，我们证明 G-Net 可以生成协变量轨迹的反事实预测，该预测符合各种液体限制机制下的临床预期。我们的研究证明了 G-Net 在预测反事实治疗结果方面的潜在临床用途，有助于临床医生为重症监护室的败血症患者做出明智的决策。

{"title":"Counterfactual Sepsis Outcome Prediction Under Dynamic and Time-Varying Treatment Regimes.","authors":"Megan Su, Stephanie Hu, Hong Xiong, Elias Baedorf Kassis, Li-Wei H Lehman","doi":"","DOIUrl":"","url":null,"abstract":"Sepsis is a life-threatening condition that occurs when the body's normal response to an infection is out of balance. A key part of managing sepsis involves the administration of intravenous fluids and vasopressors. In this work, we explore the application of G-Net, a deep sequential modeling framework for g-computation, to predict outcomes under counterfactual fluid treatment strategies in a real-world cohort of sepsis patients. Utilizing observational data collected from the intensive care unit (ICU), we evaluate the performance of multiple deep learning implementations of G-Net and compare their predictive performance with linear models in forecasting patient outcomes and trajectories over time under the observational treatment regime. We then demonstrate that G-Net can generate counterfactual prediction of covariate trajectories that align with clinical expectations across various fluid limiting regimes. Our study demonstrates the potential clinical utility of G-Net in predicting counterfactual treatment outcomes, aiding clinicians in informed decision-making for sepsis patients in the ICU.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"285-294"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Cerebral Ischemia From Electroencephalography During Carotid Endarterectomy Using Machine Learning. 利用机器学习从脑电图检测颈动脉内膜切除术中的脑缺血现象

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Amir I Mina, Jessi U Espino, Allison M Bradley, Parthasarathy D Thirumala, Kayhan Batmanghelich, Shyam Visweswaran

Monitoring cerebral neuronal activity via electroencephalography (EEG) during surgery can detect ischemia, a precursor to stroke. However, current neurophysiologist-based monitoring is prone to error. In this study, we evaluated machine learning (ML) for efficient and accurate ischemia detection. We trained supervised ML models on a dataset of 802 patients with intraoperative ischemia labels and evaluated them on an independent validation dataset of 30 patients with refined labels from five neurophysiologists. Our results show moderate-to-substantial agreement between neurophysiologists, with Cohen's kappa values between 0.59 and 0.74. Neurophysiologist performance ranged from 58-93% for sensitivity and 83-96% for specificity, while ML models demonstrated comparable ranges of 63-89% and 85-96%. Random Forest (RF), LightGBM (LGBM), and XGBoost RF achieved area under the receiver operating characteristic curve (AUROC) values of 0.92-0.93 and area under the precision-recall curve (AUPRC) values of 0.79-0.83. ML has the potential to improve intraoperative monitoring, enhancing patient safety and reducing costs.

在手术过程中通过脑电图（EEG）监测大脑神经元活动可发现缺血，这是中风的前兆。然而，目前基于神经生理学家的监测容易出错。在本研究中，我们评估了机器学习（ML）对缺血检测的效率和准确性。我们在一个包含 802 名术中缺血标签的患者数据集上训练了有监督的 ML 模型，并在一个包含 30 名患者的独立验证数据集上对这些模型进行了评估，该数据集包含来自五位神经电生理学家的精炼标签。我们的结果显示，神经电生理学家之间存在中度到实质性的一致性，科恩卡帕值介于 0.59 和 0.74 之间。神经生理学家的灵敏度为 58-93%，特异度为 83-96%，而 ML 模型的灵敏度为 63-89%，特异度为 85-96%。随机森林 (RF)、LightGBM (LGBM) 和 XGBoost RF 的接收器工作特征曲线下面积 (AUROC) 值为 0.92-0.93，精度-召回曲线下面积 (AUPRC) 值为 0.79-0.83。ML 具有改善术中监测、提高患者安全性和降低成本的潜力。

{"title":"Detecting Cerebral Ischemia From Electroencephalography During Carotid Endarterectomy Using Machine Learning.","authors":"Amir I Mina, Jessi U Espino, Allison M Bradley, Parthasarathy D Thirumala, Kayhan Batmanghelich, Shyam Visweswaran","doi":"","DOIUrl":"","url":null,"abstract":"Monitoring cerebral neuronal activity via electroencephalography (EEG) during surgery can detect ischemia, a precursor to stroke. However, current neurophysiologist-based monitoring is prone to error. In this study, we evaluated machine learning (ML) for efficient and accurate ischemia detection. We trained supervised ML models on a dataset of 802 patients with intraoperative ischemia labels and evaluated them on an independent validation dataset of 30 patients with refined labels from five neurophysiologists. Our results show moderate-to-substantial agreement between neurophysiologists, with Cohen's kappa values between 0.59 and 0.74. Neurophysiologist performance ranged from 58-93% for sensitivity and 83-96% for specificity, while ML models demonstrated comparable ranges of 63-89% and 85-96%. Random Forest (RF), LightGBM (LGBM), and XGBoost RF achieved area under the receiver operating characteristic curve (AUROC) values of 0.92-0.93 and area under the precision-recall curve (AUPRC) values of 0.79-0.83. ML has the potential to improve intraoperative monitoring, enhancing patient safety and reducing costs.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"613-622"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring Large Language Models for Acronym, Symbol Sense Disambiguation, and Semantic Similarity and Relatedness Assessment. 探索用于缩略词、符号意义消歧以及语义相似性和相关性评估的大型语言模型。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Ying Liu, Genevieve B Melton, Rui Zhang

Acronyms, abbreviations, and symbols play a significant role in clinical notes. Acronym and symbol sense disambiguation are crucial natural language processing (NLP) tasks that ensure the clarity and consistency of clinical notes and downstream NLP processing. Previous studies using traditional machine learning methods have been relatively successful in tackling this issue. In our research, we conducted an evaluation of large language models (LLMs), including ChatGPT 3.5 and 4, as well as other open LLMs, and BERT-based models, across three NLP tasks: acronym and symbol sense disambiguation, semantic similarity, and relatedness. Our findings emphasize ChatGPT's remarkable ability to distinguish between senses with minimal or zero-shot training. Additionally, open source LLM Mixtrial-8x7B exhibited high accuracy for acronyms with fewer senses, and moderate accuracy for symbol sense accuracy. BERT-based models outperformed previous machine learning approaches, achieving an impressive accuracy rate of over 95%, showcasing their effectiveness in addressing the challenge of acronym and symbol sense disambiguation. Furthermore, ChatGPT exhibited a strong correlation, surpassing 70%, with human gold standards when evaluating similarity and relatedness.

缩略语、缩写和符号在临床笔记中发挥着重要作用。缩略语和符号意义消歧是自然语言处理（NLP）的关键任务，可确保临床笔记和下游 NLP 处理的清晰度和一致性。以往使用传统机器学习方法解决这一问题的研究相对成功。在我们的研究中，我们对大型语言模型（LLM）进行了评估，包括 ChatGPT 3.5 和 4，以及其他开放式 LLM 和基于 BERT 的模型，涉及三个 NLP 任务：缩略语和符号意义消歧、语义相似性和关联性。我们的研究结果强调了 ChatGPT 在进行最少或零次训练的情况下区分词义的卓越能力。此外，开源 LLM Mixtrial-8x7B 对意义较少的缩略词表现出较高的准确性，对符号意义准确性表现出中等准确性。基于 BERT 的模型表现优于以往的机器学习方法，准确率超过 95%，令人印象深刻，展示了它们在应对首字母缩略词和符号意义消歧挑战方面的有效性。此外，在评估相似性和相关性时，ChatGPT 与人类黄金标准的相关性很强，超过了 70%。

{"title":"Exploring Large Language Models for Acronym, Symbol Sense Disambiguation, and Semantic Similarity and Relatedness Assessment.","authors":"Ying Liu, Genevieve B Melton, Rui Zhang","doi":"","DOIUrl":"","url":null,"abstract":"Acronyms, abbreviations, and symbols play a significant role in clinical notes. Acronym and symbol sense disambiguation are crucial natural language processing (NLP) tasks that ensure the clarity and consistency of clinical notes and downstream NLP processing. Previous studies using traditional machine learning methods have been relatively successful in tackling this issue. In our research, we conducted an evaluation of large language models (LLMs), including ChatGPT 3.5 and 4, as well as other open LLMs, and BERT-based models, across three NLP tasks: acronym and symbol sense disambiguation, semantic similarity, and relatedness. Our findings emphasize ChatGPT's remarkable ability to distinguish between senses with minimal or zero-shot training. Additionally, open source LLM Mixtrial-8x7B exhibited high accuracy for acronyms with fewer senses, and moderate accuracy for symbol sense accuracy. BERT-based models outperformed previous machine learning approaches, achieving an impressive accuracy rate of over 95%, showcasing their effectiveness in addressing the challenge of acronym and symbol sense disambiguation. Furthermore, ChatGPT exhibited a strong correlation, surpassing 70%, with human gold standards when evaluating similarity and relatedness.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"324-333"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multivariate mediation analysis with voxel-based morphometry revealed the neurodegeneration pathways from genetic variants to Alzheimer's Disease. 基于体素形态测量的多变量中介分析揭示了从基因变异到阿尔茨海默病的神经变性途径。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Shizhuo Mu, Jingxuan Bao, Hanxiang Xu, Manu Shivakumar, Shu Yang, Xia Ning, Dokyoon Kim, Christos Davatzikos, Haochang Shou, Li Shen

Neurodegenerative processes are increasingly recognized as potential causative factors in Alzheimer's disease (AD) pathogenesis. While many studies have leveraged mediation analysis models to elucidate the underlying mechanisms linking genetic variants to AD diagnostic outcomes, the majority have predominantly focused on regional brain measure as a mediator, thereby compromising the granularity of the imaging data. In our investigation, using the imaging genetics data from a landmark AD cohort, we contrasted both region-based and voxel-based brain measurements as imaging endophenotypes, and examined their roles in mediating genetic effects on AD outcomes. Our findings underscored that using voxel-based morphometry offers enhanced statistical power. Moreover, we delineated specific mediation pathways between SNP, brain volume, and AD outcomes, shedding light on the intricate relationship among these variables.

神经退行性过程越来越被认为是阿尔茨海默病（AD）发病机制的潜在致病因素。虽然许多研究都利用中介分析模型来阐明遗传变异与阿尔茨海默病诊断结果之间的内在机制，但大多数研究都主要关注作为中介因素的大脑区域测量，从而影响了成像数据的粒度。在我们的研究中，我们利用一个具有里程碑意义的AD队列的影像遗传学数据，对比了基于区域和基于体素的脑部测量结果作为影像内表型，并研究了它们在介导AD结果的遗传效应中的作用。我们的研究结果表明，使用基于体素的形态测量可提高统计能力。此外，我们还划定了SNP、脑容量和AD结果之间的特定中介途径，揭示了这些变量之间错综复杂的关系。

{"title":"Multivariate mediation analysis with voxel-based morphometry revealed the neurodegeneration pathways from genetic variants to Alzheimer's Disease.","authors":"Shizhuo Mu, Jingxuan Bao, Hanxiang Xu, Manu Shivakumar, Shu Yang, Xia Ning, Dokyoon Kim, Christos Davatzikos, Haochang Shou, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"Neurodegenerative processes are increasingly recognized as potential causative factors in Alzheimer's disease (AD) pathogenesis. While many studies have leveraged mediation analysis models to elucidate the underlying mechanisms linking genetic variants to AD diagnostic outcomes, the majority have predominantly focused on regional brain measure as a mediator, thereby compromising the granularity of the imaging data. In our investigation, using the imaging genetics data from a landmark AD cohort, we contrasted both region-based and voxel-based brain measurements as imaging endophenotypes, and examined their roles in mediating genetic effects on AD outcomes. Our findings underscored that using voxel-based morphometry offers enhanced statistical power. Moreover, we delineated specific mediation pathways between SNP, brain volume, and AD outcomes, shedding light on the intricate relationship among these variables.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"344-353"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Text and Audio Simplification: Human vs. ChatGPT. 文本和音频简化：人类与 ChatGPT。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Gondy Leroy, David Kauchak, Philip Harber, Ankit Pal, Akash Shukla

Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora (using five different prompts). We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated the user study texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert's evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.

简化文本和音频以提高信息理解能力在医疗保健领域非常重要。随着 ChatGPT 的推出，需要对其简化性能进行评估。我们使用 14 个文本难度指标对人类和 ChatGPT 简化文本进行了系统比较。我们简要介绍了我们的在线编辑器，包括 ChatGPT 在内的这些简化工具都可以在这里使用。我们使用我们的指标对 12 个语料库进行了评分：6 个文本、1 个音频和 5 个 ChatGPT 简化语料库（使用 5 种不同的提示）。然后，我们将这些语料与之前用户研究中简化和验证过的文本进行比较。最后，一位医学领域专家对用户研究文本和五个新的 ChatGPT 简化版本进行了评估。我们发现，简单的语料库与人类简化文本的相似度更高。ChatGPT 简化版的度量方向是正确的。医学领域专家的评估结果显示了对 ChatGPT 风格的偏好，但文本本身在内容保留方面的评分较低。

引用次数: 0

Voice-Enabled Response Analysis Agent (VERAA): Leveraging Large Language Models to Map Voice Responses in SDoH Survey. 语音应答分析代理（VERAA）：利用大型语言模型绘制 SDoH 调查中的语音应答。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Rishivardhan Krishnamoorthy, Vishal Nagarajan, Hayden Pour, Supreeth P Shashikumar, Aaron Boussina, Emilia Farcas, Shamim Nemati, Christopher S Josef

Social Determinants of Health (SDoH) have been shown to have profound impacts on health-related outcomes, yet this data suffers from high rates of missingness in electronic health records (EHR). Moreover, limited English proficiency in the United States can be a barrier to communication with health care providers. In this study, we have designed a multilingual conversational agent capable of conducting SDoH surveys for use in healthcare environments. The agent asks questions in the patient's native language, translates responses into English, and subsequently maps these responses via a large language model (LLM) to structured options in a SDoH survey. This tool can be extended to a variety of survey instruments in either hospital or home settings, enabling the extraction of structured insights from free-text answers. The proposed approach heralds a shift towards more inclusive and insightful data collection, marking a significant stride in SDoH data enrichment for optimizing health outcome predictions and interventions.

健康的社会决定因素（SDoH）已被证明对健康相关结果有着深远的影响，但在电子健康记录（EHR）中，这些数据的缺失率很高。此外，在美国，英语水平有限也会成为与医疗服务提供者沟通的障碍。在这项研究中，我们设计了一个能够在医疗环境中进行 SDoH 调查的多语言对话代理。该代理用患者的母语提问，将回答翻译成英语，然后通过大型语言模型（LLM）将这些回答映射到 SDoH 调查的结构化选项中。该工具可扩展到医院或家庭环境中的各种调查工具，从而能够从自由文本答案中提取结构化见解。所提出的方法预示着向更具包容性和洞察力的数据收集转变，标志着在丰富 SDoH 数据以优化健康结果预测和干预方面取得了重大进展。

{"title":"Voice-Enabled Response Analysis Agent (VERAA): Leveraging Large Language Models to Map Voice Responses in SDoH Survey.","authors":"Rishivardhan Krishnamoorthy, Vishal Nagarajan, Hayden Pour, Supreeth P Shashikumar, Aaron Boussina, Emilia Farcas, Shamim Nemati, Christopher S Josef","doi":"","DOIUrl":"","url":null,"abstract":"Social Determinants of Health (SDoH) have been shown to have profound impacts on health-related outcomes, yet this data suffers from high rates of missingness in electronic health records (EHR). Moreover, limited English proficiency in the United States can be a barrier to communication with health care providers. In this study, we have designed a multilingual conversational agent capable of conducting SDoH surveys for use in healthcare environments. The agent asks questions in the patient's native language, translates responses into English, and subsequently maps these responses via a large language model (LLM) to structured options in a SDoH survey. This tool can be extended to a variety of survey instruments in either hospital or home settings, enabling the extraction of structured insights from free-text answers. The proposed approach heralds a shift towards more inclusive and insightful data collection, marking a significant stride in SDoH data enrichment for optimizing health outcome predictions and interventions.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"258-265"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare. 增强数据质量保证的众包：缓解医疗保健大型语言模型训练中资源稀缺挑战的有效方法。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Prosanta Barai, Gondy Leroy, Prakash Bisht, Joshua M Rothman, Sumi Lee, Jennifer Andrews, Sydney A Rice, Arif Ahmed

Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.

大型语言模型（LLM）在包括医疗保健在内的各个领域的人工智能中都展现出了巨大的潜力。然而，由于需要高质量的标注数据，这些数据的创建通常既昂贵又耗时，尤其是在医疗保健等资源匮乏的领域，这就阻碍了它们的功效。为了应对这些挑战，我们提出了一个众包（CS）框架，该框架在数据收集前、实时和收集后阶段都加入了质量控制措施。我们的研究通过数据质量对预测自闭症相关症状的 LLMs（Bio-BERT）的影响，评估了提高数据质量的有效性。结果表明，与质量控制前相比，实时质量控制可将数据质量提高 19%。与 Bio-BERT 基线相比，使用众包数据对 Bio-BERT 进行微调普遍提高了召回率，但降低了精确度。我们的研究结果凸显了众包和质量控制在资源受限环境中的潜力，并为优化医疗保健 LLMs 以做出明智决策和改善患者护理提供了启示。

{"title":"Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare.","authors":"Prosanta Barai, Gondy Leroy, Prakash Bisht, Joshua M Rothman, Sumi Lee, Jennifer Andrews, Sydney A Rice, Arif Ahmed","doi":"","DOIUrl":"","url":null,"abstract":"Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"75-84"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141838/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering. 跨域非结构化文档的实用去标识化：利用关系提取过滤的实用性保护方法

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Liubov Nedoshivina, Anisa Halimi, Joao Bettencourt-Silva, Stefano Braghin

The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present READI, a utility preserving framework for the unstructured document de-identification. READI leverages Named Entity Recognition and Relation Extraction technology to improve the quality of the entity detection, thus improving the overall quality of the data de-identification process. In this proof of concept study, we evaluate the proposed approach on two different datasets and compare with the existing state-of-the-art approaches. We show that Relation Extraction-based Approach for De-Identification (READI) notably reduces the number of false positives and improves the utility of the de-identified text.

每天产生的信息量，尤其是个人信息，正在以惊人的速度增长。利用这些信息的能力在很大程度上取决于能否满足世界各地出现的众多合规和隐私法规的要求。我们介绍的 READI 是一个用于非结构化文档去标识化的实用保护框架。READI 利用命名实体识别和关系提取技术来提高实体检测的质量，从而提高数据去标识化过程的整体质量。在这项概念验证研究中，我们在两个不同的数据集上对所提出的方法进行了评估，并与现有的最先进方法进行了比较。我们发现，基于关系提取的去标识化方法（READI）显著减少了误报的数量，提高了去标识化文本的实用性。

引用次数: 0

An Explainable Artificial Intelligence-enabled ECG Framework for the Prediction of Subclinical Coronary Atherosclerosis. 用于预测亚临床冠状动脉粥样硬化的可解释人工智能心电图框架。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Changho Han, Dukyong Yoon

Coronary artery calcium (CAC) as assessed by computed tomography (CT) is a marker of subclinical coronary atherosclerosis. However, routine application of CAC scoring via CT is limited by high costs and accessibility. An electrocardiogram (ECG) is a widely-used, sensitive, cost-effective, non-invasive, and radiation-free diagnostic tool. Considering this, if artificial intelligence (AI)-enabled electrocardiograms (ECGs) could opportunistically detect CAC, it would be particularly beneficial for the asymptomatic or subclinical populations, acting as an initial screening measure, paving the way for further confirmatory tests and preventive strategies, a step ahead of conventional practices. With this aim, we developed an AI-enabled ECG framework that not only predicts a CAC score ≥400 but also offers a visual explanation of the associated potential morphological ECG changes, and tested its efficacy on individuals undergoing health checkups, a group primarily comprising healthy or subclinical individuals. To ensure broader applicability, we performed external validation at a separate institution.

通过计算机断层扫描（CT）评估的冠状动脉钙化（CAC）是亚临床冠状动脉粥样硬化的标志。然而，通过 CT 进行 CAC 评分的常规应用受到高成本和可及性的限制。心电图（ECG）是一种广泛使用、灵敏度高、成本效益高、无创伤、无辐射的诊断工具。有鉴于此，如果人工智能（AI）支持的心电图（ECG）能适时检测出 CAC，那么它将特别有益于无症状或亚临床人群，可作为初步筛查措施，为进一步的确诊测试和预防策略铺平道路，比传统做法更进一步。为此，我们开发了一个人工智能心电图框架，它不仅能预测 CAC 评分≥400，还能对相关的潜在心电图形态学变化提供可视化解释，并在接受健康检查的人群（主要包括健康或亚临床人群）中测试了其有效性。为了确保更广泛的适用性，我们在另外一家机构进行了外部验证。

{"title":"An Explainable Artificial Intelligence-enabled ECG Framework for the Prediction of Subclinical Coronary Atherosclerosis.","authors":"Changho Han, Dukyong Yoon","doi":"","DOIUrl":"","url":null,"abstract":"Coronary artery calcium (CAC) as assessed by computed tomography (CT) is a marker of subclinical coronary atherosclerosis. However, routine application of CAC scoring via CT is limited by high costs and accessibility. An electrocardiogram (ECG) is a widely-used, sensitive, cost-effective, non-invasive, and radiation-free diagnostic tool. Considering this, if artificial intelligence (AI)-enabled electrocardiograms (ECGs) could opportunistically detect CAC, it would be particularly beneficial for the asymptomatic or subclinical populations, acting as an initial screening measure, paving the way for further confirmatory tests and preventive strategies, a step ahead of conventional practices. With this aim, we developed an AI-enabled ECG framework that not only predicts a CAC score ≥400 but also offers a visual explanation of the associated potential morphological ECG changes, and tested its efficacy on individuals undergoing health checkups, a group primarily comprising healthy or subclinical individuals. To ensure broader applicability, we performed external validation at a separate institution.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"535-544"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141849/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Traumatic Brain Injury Prescreening Tool for Intimate Partner Violence Patients Using Initial Clinical Reports and Machine Learning. 利用初始临床报告和机器学习为亲密伴侣暴力患者提供创伤性脑损伤预检工具。

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Abheet Singh Sachdeva, Avery Bell, Dr Jacob Furst, Dorothy A Kozlowski, Sonya Crabtree-Nelson, Daniela Raicu

Research studies have presented an unappreciated relationship between intimate partner violence (IPV) survivors and symptoms of traumatic brain injuries (TBI). Within these IPV survivors, resulting TBIs are not always identified during emergency room visits. This demonstrates a need for a prescreening tool that identifies IPV survivors who should receive TBI screening. We present a model that measures similarities to clinical reports for confirmed TBI cases to identify whether a patient should be screened for TBI. This is done through an ensemble of three supervised learning classifiers which work in two distinct feature spaces. Individual classifiers are trained on clinical reports and then used to create an ensemble that needs only one positive label to indicate a patient should be screened for TBI.

研究表明，亲密伴侣暴力 (IPV) 幸存者与创伤性脑损伤 (TBI) 症状之间的关系未得到重视。在这些 IPV 幸存者中，并不总能在急诊室就诊时发现由此导致的创伤性脑损伤。这表明我们需要一种预检工具来识别应接受 TBI 筛查的 IPV 幸存者。我们提出了一个模型，该模型可测量确诊创伤性脑损伤病例与临床报告的相似性，以确定患者是否应接受创伤性脑损伤筛查。这是通过在两个不同的特征空间中工作的三个监督学习分类器的组合来实现的。单个分类器根据临床报告进行训练，然后用于创建一个集合，该集合只需要一个阳性标签就能表明患者应接受创伤性脑损伤筛查。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀