首页 > 最新文献

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science最新文献

英文 中文
Aiming for Relevance. 以相关性为目标。
Bar Eini-Porat, Danny Eytan, Uri Shalit

Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care.

生命体征对重症监护病房(ICU)至关重要。它们用于跟踪病人的状态,并识别临床上的重大变化。预测生命体征轨迹对于早期发现不良事件很有价值。然而,RMSE 等传统机器学习指标往往无法捕捉此类预测的真正临床意义。我们引入了符合临床背景的新型生命体征预测性能指标,重点关注与临床标准的偏差、总体趋势和趋势偏差。这些指标来源于之前一项研究通过采访重症监护室临床医生获得的经验效用曲线。我们使用模拟和真实临床数据集(MIMIC 和 eICU)验证了这些指标的实用性。此外,我们还将这些指标作为神经网络的损失函数,从而建立了能够出色预测临床重大事件的模型。这项研究为临床相关的机器学习模型评估和优化铺平了道路,有望改善重症监护室的患者护理。
{"title":"Aiming for Relevance.","authors":"Bar Eini-Porat, Danny Eytan, Uri Shalit","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"145-154"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Phenotypic Associations for Parkinson's Disease with Longitudinal Clinical Records. 利用纵向临床记录学习帕金森病的表型关联。
Weishen Pan, Chang Su, Jacqueline R M A Maasch, Kun Chen, Claire Henchcliffe, Fei Wang

Parkinson's disease (PD) is associated with multiple clinical motor and non-motor manifestations. Understanding of PD etiologies has been informed by a growing number of genetic mutations and various fluid-based and brain imaging biomarkers. However, the mechanisms underlying its varied phenotypic features remain elusive. The present work introduces a data-driven approach for generating phenotypic association graphs for PD cohorts. Data collected by the Parkinson's Progression Markers Initiative (PPMI), the Parkinson's Disease Biomarkers Program (PDBP), and the Fox Investigation for New Discovery of Biomarkers (BioFIND) were analyzed by this approach to identify heterogeneous and longitudinal phenotypic associations that may provide insight into the pathology of this complex disease. Findings based on the phenotypic association graphs could improve understanding of longitudinal PD pathologies and how these relate to patient symptomology.

帕金森病(PD)与多种临床运动和非运动表现有关。越来越多的基因突变和各种基于体液和脑成像的生物标志物使人们对帕金森病的病因有了更多的了解。然而,其各种表型特征的内在机制仍然难以捉摸。本研究介绍了一种数据驱动方法,用于生成帕金森病队列的表型关联图。该方法分析了帕金森病进展标志物倡议(PPMI)、帕金森病生物标志物计划(PDBP)和福克斯生物标志物新发现调查(BioFIND)收集的数据,以确定异质性和纵向表型关联,从而深入了解这种复杂疾病的病理。基于表型关联图的研究结果可提高对纵向帕金森病病理以及这些病理与患者症状之间关系的认识。
{"title":"Learning Phenotypic Associations for Parkinson's Disease with Longitudinal Clinical Records.","authors":"Weishen Pan, Chang Su, Jacqueline R M A Maasch, Kun Chen, Claire Henchcliffe, Fei Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Parkinson's disease (PD) is associated with multiple clinical motor and non-motor manifestations. Understanding of PD etiologies has been informed by a growing number of genetic mutations and various fluid-based and brain imaging biomarkers. However, the mechanisms underlying its varied phenotypic features remain elusive. The present work introduces a data-driven approach for generating phenotypic association graphs for PD cohorts. Data collected by the Parkinson's Progression Markers Initiative (PPMI), the Parkinson's Disease Biomarkers Program (PDBP), and the Fox Investigation for New Discovery of Biomarkers (BioFIND) were analyzed by this approach to identify heterogeneous and longitudinal phenotypic associations that may provide insight into the pathology of this complex disease. Findings based on the phenotypic association graphs could improve understanding of longitudinal PD pathologies and how these relate to patient symptomology.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"374-383"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Note Structural Knowledge Improves Word Sense Disambiguation. 临床笔记结构知识改善了词义消歧。
Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng

Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.

临床笔记中充满了模棱两可的医学缩写。最近基于学习的方法利用上下文知识进行意义消歧。以前的研究结果表明,临床笔记的结构元素包含有用的特征,可为缩写的不同解释提供信息,但这些特征仍未得到充分利用,也未得到充分研究。据我们所知,唯一一项探索笔记结构的研究只是列举了笔记中的标题,而这种表述并不具有语义意义。本文介绍了一种基于学习的方法,该方法使用统一医学语言系统(UMLS)中预定义的语义类型来表示笔记结构。除了广泛使用的 N-gram,我们还在两个不同的数据集上使用三种学习模型对该表示法进行了评估。实验结果表明,我们的特征增强技术持续提高了缩写消歧模型的性能,最佳 F1 得分为 0.93。
{"title":"Clinical Note Structural Knowledge Improves Word Sense Disambiguation.","authors":"Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"515-524"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Large Language Models for Complex Structured Tasks. 复杂结构任务的本地大型语言模型
V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert

This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.

本文介绍了一种将大型语言模型(LLM)的语言推理能力与本地训练的优势相结合的方法,以解决复杂的语言任务。作者通过从病理报告中提取结构化条件代码来演示他们的方法。所提出的方法利用本地微调 LLM 来响应特定的生成指令,并提供结构化输出。该方法使用了超过 150k 份未经整理的外科病理报告,其中包含大体描述、最终诊断和病情代码。对不同的模型架构进行了训练和评估,包括 LLaMA、BERT 和 LongFormer。结果表明,在所有评估指标上,基于 LLaMA 的模型明显优于 BERT 类型的模型。LLaMA 模型在大型数据集上的表现尤为出色,证明了其处理复杂、多标签任务的能力。总之,这项研究提出了一种有效的方法,可以利用 LLM 对医学领域的特定语言执行结构化生成任务。
{"title":"Local Large Language Models for Complex Structured Tasks.","authors":"V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"105-114"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SABER: Statistical Identification of Loci of Interest in GWAS Summary Statistics using a Bayesian Gaussian Mixture Model. SABER:使用贝叶斯高斯混杂模型统计识别 GWAS 摘要统计中的相关基因位点。
Rachit Kumar, Rasika Venkatesh, Marylyn D Ritchie

Genome-wide association studies (GWAS) remain a popular method for identifying novel genetic associations with human phenotypes and have provided many insights into the etiology of many diseases. However, GWAS provide limited support for how a genetic association might contribute to disease due to inherent limitations, such as linkage disequilibrium. As such, many methods that operate on GWAS summary statistics have been developed to generate evidence for functional pathways or for variants of interest, but they require defining the genomic region bounds for loci of interest. At present, there are limited methods for determining these bounds in a rigorous, reproducible way. We present a novel statistical method, Statistical Analysis for Bayesian Estimation of Regions (SABER), that uses Bayesian Gaussian mixture models to reproducibly generate ratios that quantify whether particular genomic positions represent the bounds of loci of interest and can be used to delineate genomic regions for downstream analyses.

全基因组关联研究(GWAS)仍然是确定新的遗传关联与人类表型的常用方法,并为许多疾病的病因学提供了许多见解。然而,全基因组关联研究因其固有的局限性(如连锁不平衡),对遗传关联如何导致疾病提供的支持有限。因此,人们开发了许多基于 GWAS 概要统计的方法,为功能途径或感兴趣的变异提供证据,但这些方法需要定义感兴趣基因座的基因组区域边界。目前,以严格、可重复的方式确定这些界限的方法还很有限。我们提出了一种新颖的统计方法--区域贝叶斯估计统计分析(SABER),它使用贝叶斯高斯混合模型可重复地生成比率,量化特定基因组位置是否代表感兴趣基因座的边界,并可用于为下游分析划定基因组区域。
{"title":"SABER: Statistical Identification of Loci of Interest in GWAS Summary Statistics using a Bayesian Gaussian Mixture Model.","authors":"Rachit Kumar, Rasika Venkatesh, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Genome-wide association studies (GWAS) remain a popular method for identifying novel genetic associations with human phenotypes and have provided many insights into the etiology of many diseases. However, GWAS provide limited support for how a genetic association might contribute to disease due to inherent limitations, such as linkage disequilibrium. As such, many methods that operate on GWAS summary statistics have been developed to generate evidence for functional pathways or for variants of interest, but they require defining the genomic region bounds for loci of interest. At present, there are limited methods for determining these bounds in a rigorous, reproducible way. We present a novel statistical method, Statistical Analysis for Bayesian Estimation of Regions (SABER), that uses Bayesian Gaussian mixture models to reproducibly generate ratios that quantify whether particular genomic positions represent the bounds of loci of interest and can be used to delineate genomic regions for downstream analyses.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"575-583"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141805/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topology-based Clustering of Functional Brain Networks in an Alzheimer's Disease Cohort. 基于拓扑结构的阿尔茨海默氏症队列大脑功能网络聚类研究
Frederick H Xu, Michael Gao, Jiong Chen, Sumita Garai, Duy Anh Duong-Tran, Yize Zhao, Li Shen

Alzheimer's disease is a progressive neurodegenerative disease with many identifying biomarkers for diagnosis. However, whole-brain phenomena, particularly in functional MRI modalities, are not fully understood nor characterized. Here we employ the novel application of topological data analysis (TDA)-based methods of persistent homology to functional brain networks from ADNI-3 cohort to perform a subtyping experiment using unsupervised clustering techniques. We then investigate variations in QT-PAD challenge features across the identified clusters. Using a Wasserstein distance kernel with a variety of clustering algorithms, we found that the 0th-homology Wasserstein distance kernel and spectral clustering yielded clusters with significant differences in whole brain and medial temporal lobe (MTL) volume, thus demonstrating an intrinsic link between whole brain functional topology and brain morphometric structure. These findings demonstrate the importance of MTL in functional connectivity and the efficacy of using TDA-based machine learning methods in network neuroscience and neurodegenerative disease subtyping.

阿尔茨海默病是一种进行性神经退行性疾病,有许多可用于诊断的生物标志物。然而,人们对全脑现象,尤其是功能性核磁共振成像(MRI)模式的全脑现象并不完全了解,也没有对其进行特征描述。在这里,我们将基于拓扑数据分析(TDA)的持续同源性方法新颖地应用于 ADNI-3 队列中的大脑功能网络,利用无监督聚类技术进行了一次亚型实验。然后,我们研究了已识别聚类中 QT-PAD 挑战特征的变化。通过使用瓦瑟斯坦距离核和多种聚类算法,我们发现第 0 次同源性瓦瑟斯坦距离核和谱聚类产生的聚类在全脑和内侧颞叶(MTL)体积上存在显著差异,从而证明了全脑功能拓扑和大脑形态结构之间的内在联系。这些发现证明了内侧颞叶在功能连接中的重要性,以及在网络神经科学和神经退行性疾病亚型分析中使用基于 TDA 的机器学习方法的有效性。
{"title":"Topology-based Clustering of Functional Brain Networks in an Alzheimer's Disease Cohort.","authors":"Frederick H Xu, Michael Gao, Jiong Chen, Sumita Garai, Duy Anh Duong-Tran, Yize Zhao, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease is a progressive neurodegenerative disease with many identifying biomarkers for diagnosis. However, whole-brain phenomena, particularly in functional MRI modalities, are not fully understood nor characterized. Here we employ the novel application of topological data analysis (TDA)-based methods of persistent homology to functional brain networks from ADNI-3 cohort to perform a subtyping experiment using unsupervised clustering techniques. We then investigate variations in QT-PAD challenge features across the identified clusters. Using a Wasserstein distance kernel with a variety of clustering algorithms, we found that the 0<sup>th</sup>-homology Wasserstein distance kernel and spectral clustering yielded clusters with significant differences in whole brain and medial temporal lobe (MTL) volume, thus demonstrating an intrinsic link between whole brain functional topology and brain morphometric structure. These findings demonstrate the importance of MTL in functional connectivity and the efficacy of using TDA-based machine learning methods in network neuroscience and neurodegenerative disease subtyping.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"449-458"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Study of Biomedical Relation Extraction Using GPT Models. 使用 GPT 模型提取生物医学关系的研究。
Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja

Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.

关系提取(RE)是一项自然语言处理(NLP)任务,用于提取生物医学实体之间的语义关系。预训练大型语言模型(LLM)的最新发展促使 NLP 研究人员将其用于各种 NLP 任务。我们研究了从 EU-ADR、Gene Associations Database (GAD) 和 ChemProt 这三个标准数据集中提取关系的 GPT-3.5-turbo 和 GPT-4。与使用带有屏蔽实体的数据集的现有方法不同,我们在实验中对每个数据集使用了三个版本:带有屏蔽实体的版本、带有原始实体(未屏蔽)的第二个版本以及用原始术语替换缩写的第三个版本。我们为不同版本开发了提示,并使用了 GPT API 的聊天完成模型。我们的方法在 GPT-3.5-turbo 中取得了 0.498 到 0.809 的 F1 分数,在 GPT-4 中取得了 0.84 的最高 F1 分数。在某些实验中,GPT、BioBERT 和 PubMedBERT 的性能几乎相同。
{"title":"A Study of Biomedical Relation Extraction Using GPT Models.","authors":"Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"391-400"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Automated Approach for Identifying Erroneous IS-A Relations in SNOMED CT. 识别 SNOMED CT 中错误 IS-A 关系的自动方法。
Ran Hu, Jay Shi, Licong Cui, Rashmie Abeysinghe

SNOMED CT is the most comprehensive clinical terminology employed worldwide and enhancing its accuracy is of utmost importance. In this work, we introduce an automated approach to identifying erroneous IS-A relations in SNOMED CT. We first extract linked concept-pairs from which we generate Term Difference Pairs (TDPs) that contain differences between the concepts. Given a TDP, if the reversed TDP also exists and the number of linked-pairs generating this TDP is less than those generating the reversed TDP, then we suggest the former linked-pairs as potentially erroneous IS-A relations. We applied this approach to the Clinical finding and Procedure subhierarchies of the 2022 March US Edition of SNOMED CT, and obtained 52 potentially erroneous IS-A relations and a candidate list of 48 linked-pairs. A domain expert confirmed 41 out of 52 (78.8%) are valid and identified 26 erroneous IS-A relations out of 48 linked-pairs demonstrating the effectiveness of the approach.

SNOMED CT 是全球使用的最全面的临床术语,提高其准确性至关重要。在这项工作中,我们引入了一种自动方法来识别 SNOMED CT 中错误的 IS-A 关系。我们首先提取链接的概念对,从中生成包含概念间差异的术语差异对(TDP)。给定一个 TDP,如果反向 TDP 也存在,并且生成该 TDP 的链接对数量少于生成反向 TDP 的链接对数量,那么我们就将前一个链接对视为潜在的错误 IS-A 关系。我们将这种方法应用于 2022 年 3 月美国版 SNOMED CT 的临床发现和程序子层次结构,得到了 52 个潜在错误的 IS-A 关系和 48 个链接对的候选列表。一位领域专家确认了 52 个关系中的 41 个(78.8%)是有效的,并从 48 个链接对中找出了 26 个错误的 IS-A 关系,证明了该方法的有效性。
{"title":"An Automated Approach for Identifying Erroneous IS-A Relations in SNOMED CT.","authors":"Ran Hu, Jay Shi, Licong Cui, Rashmie Abeysinghe","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>SNOMED CT is the most comprehensive clinical terminology employed worldwide and enhancing its accuracy is of utmost importance. In this work, we introduce an automated approach to identifying erroneous IS-A relations in SNOMED CT. We first extract linked concept-pairs from which we generate Term Difference Pairs (TDPs) that contain differences between the concepts. Given a TDP, if the reversed TDP also exists and the number of linked-pairs generating this TDP is less than those generating the reversed TDP, then we suggest the former linked-pairs as potentially erroneous IS-A relations. We applied this approach to the Clinical finding and Procedure subhierarchies of the 2022 March US Edition of SNOMED CT, and obtained 52 potentially erroneous IS-A relations and a candidate list of 48 linked-pairs. A domain expert confirmed 41 out of 52 (78.8%) are valid and identified 26 erroneous IS-A relations out of 48 linked-pairs demonstrating the effectiveness of the approach.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"545-554"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Semantic Topic Modeling on Twitter Using MetaMap. 利用 MetaMap 在 Twitter 上实现语义主题建模。
Rebecca Shyu, Chunhua Weng

Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning "hydroxychloroquine" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.

主题建模在短语或短句以及不断变化的俚语方面表现不佳,而这些在社交媒体(如 X,前身为 Twitter)中很常见。本研究探讨了 MetaMap 等概念注释工具能否在语义层面上实现主题建模。以提及 "羟氯喹 "的推文为案例,我们提取了在 2020 年 1 月 3 日至 2021 年 1 月 12 日期间发布的 56017 条推文。这些推文通过 MetaMap 以 UMLS Concept Unique Identifiers (CUI) 对概念进行编码,然后我们使用 Latent Dirichlet Allocation (LDA) 为两个数据集确定最佳模型:1)带有原始文本的推文;2)带有替换 CUI 的推文。我们发现,MetaMap LDA模型在一致性和代表性方面优于非MetaMap模型,并能及时识别与社会和政治讨论相关的话题。我们的结论是,通过 UMLS 概念整合 MetaMap 来标准化推文,可以在文本噪声中提高语义主题建模性能。
{"title":"Enabling Semantic Topic Modeling on Twitter Using MetaMap.","authors":"Rebecca Shyu, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning \"hydroxychloroquine\" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"670-678"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-phenotype associations between Alzheimer's Disease and its comorbidities may provide clues to progression. 阿尔茨海默病及其并发症之间的交叉表型关联可为病情发展提供线索。
Anni Moore, Marylyn D Ritchie

Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.

阿尔茨海默病(AD)是全球发病率最高的神经退行性疾病,到 2023 年,每九个 65 岁以上的人中就有一人患病。在这项研究中,我们采用表型组广泛关联研究(PheWAS)方法,从英国生物库(UKBB)(n=361,194 名欧洲血统者)和 eMERGE 网络(n=105,108 名不同血统者)中找出先前确定的 AD 遗传关联与电子健康记录(EHR)诊断之间的交叉表型。基于先前从阿尔茨海默病变异门户网站(ADVP)发现的 497 个阿尔茨海默病相关变异,我们在 PheWAS 中发现了主要与免疫和心脏相关疾病有关的显著关联。复制变异对不同组织类型的免疫基因有着广泛的影响。这项研究证明了使用 PheWAS 策略的潜力,它可以提高我们对艾滋病进展的认识,并为新的治疗和疾病预防策略确定潜在的药物再利用机会。
{"title":"Cross-phenotype associations between Alzheimer's Disease and its comorbidities may provide clues to progression.","authors":"Anni Moore, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"623-631"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1