Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care.
{"title":"Aiming for Relevance.","authors":"Bar Eini-Porat, Danny Eytan, Uri Shalit","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"145-154"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weishen Pan, Chang Su, Jacqueline R M A Maasch, Kun Chen, Claire Henchcliffe, Fei Wang
Parkinson's disease (PD) is associated with multiple clinical motor and non-motor manifestations. Understanding of PD etiologies has been informed by a growing number of genetic mutations and various fluid-based and brain imaging biomarkers. However, the mechanisms underlying its varied phenotypic features remain elusive. The present work introduces a data-driven approach for generating phenotypic association graphs for PD cohorts. Data collected by the Parkinson's Progression Markers Initiative (PPMI), the Parkinson's Disease Biomarkers Program (PDBP), and the Fox Investigation for New Discovery of Biomarkers (BioFIND) were analyzed by this approach to identify heterogeneous and longitudinal phenotypic associations that may provide insight into the pathology of this complex disease. Findings based on the phenotypic association graphs could improve understanding of longitudinal PD pathologies and how these relate to patient symptomology.
{"title":"Learning Phenotypic Associations for Parkinson's Disease with Longitudinal Clinical Records.","authors":"Weishen Pan, Chang Su, Jacqueline R M A Maasch, Kun Chen, Claire Henchcliffe, Fei Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Parkinson's disease (PD) is associated with multiple clinical motor and non-motor manifestations. Understanding of PD etiologies has been informed by a growing number of genetic mutations and various fluid-based and brain imaging biomarkers. However, the mechanisms underlying its varied phenotypic features remain elusive. The present work introduces a data-driven approach for generating phenotypic association graphs for PD cohorts. Data collected by the Parkinson's Progression Markers Initiative (PPMI), the Parkinson's Disease Biomarkers Program (PDBP), and the Fox Investigation for New Discovery of Biomarkers (BioFIND) were analyzed by this approach to identify heterogeneous and longitudinal phenotypic associations that may provide insight into the pathology of this complex disease. Findings based on the phenotypic association graphs could improve understanding of longitudinal PD pathologies and how these relate to patient symptomology.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"374-383"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng
Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.
临床笔记中充满了模棱两可的医学缩写。最近基于学习的方法利用上下文知识进行意义消歧。以前的研究结果表明,临床笔记的结构元素包含有用的特征,可为缩写的不同解释提供信息,但这些特征仍未得到充分利用,也未得到充分研究。据我们所知,唯一一项探索笔记结构的研究只是列举了笔记中的标题,而这种表述并不具有语义意义。本文介绍了一种基于学习的方法,该方法使用统一医学语言系统(UMLS)中预定义的语义类型来表示笔记结构。除了广泛使用的 N-gram,我们还在两个不同的数据集上使用三种学习模型对该表示法进行了评估。实验结果表明,我们的特征增强技术持续提高了缩写消歧模型的性能,最佳 F1 得分为 0.93。
{"title":"Clinical Note Structural Knowledge Improves Word Sense Disambiguation.","authors":"Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"515-524"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert
This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.
{"title":"Local Large Language Models for Complex Structured Tasks.","authors":"V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"105-114"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome-wide association studies (GWAS) remain a popular method for identifying novel genetic associations with human phenotypes and have provided many insights into the etiology of many diseases. However, GWAS provide limited support for how a genetic association might contribute to disease due to inherent limitations, such as linkage disequilibrium. As such, many methods that operate on GWAS summary statistics have been developed to generate evidence for functional pathways or for variants of interest, but they require defining the genomic region bounds for loci of interest. At present, there are limited methods for determining these bounds in a rigorous, reproducible way. We present a novel statistical method, Statistical Analysis for Bayesian Estimation of Regions (SABER), that uses Bayesian Gaussian mixture models to reproducibly generate ratios that quantify whether particular genomic positions represent the bounds of loci of interest and can be used to delineate genomic regions for downstream analyses.
{"title":"SABER: Statistical Identification of Loci of Interest in GWAS Summary Statistics using a Bayesian Gaussian Mixture Model.","authors":"Rachit Kumar, Rasika Venkatesh, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Genome-wide association studies (GWAS) remain a popular method for identifying novel genetic associations with human phenotypes and have provided many insights into the etiology of many diseases. However, GWAS provide limited support for how a genetic association might contribute to disease due to inherent limitations, such as linkage disequilibrium. As such, many methods that operate on GWAS summary statistics have been developed to generate evidence for functional pathways or for variants of interest, but they require defining the genomic region bounds for loci of interest. At present, there are limited methods for determining these bounds in a rigorous, reproducible way. We present a novel statistical method, Statistical Analysis for Bayesian Estimation of Regions (SABER), that uses Bayesian Gaussian mixture models to reproducibly generate ratios that quantify whether particular genomic positions represent the bounds of loci of interest and can be used to delineate genomic regions for downstream analyses.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"575-583"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141805/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frederick H Xu, Michael Gao, Jiong Chen, Sumita Garai, Duy Anh Duong-Tran, Yize Zhao, Li Shen
Alzheimer's disease is a progressive neurodegenerative disease with many identifying biomarkers for diagnosis. However, whole-brain phenomena, particularly in functional MRI modalities, are not fully understood nor characterized. Here we employ the novel application of topological data analysis (TDA)-based methods of persistent homology to functional brain networks from ADNI-3 cohort to perform a subtyping experiment using unsupervised clustering techniques. We then investigate variations in QT-PAD challenge features across the identified clusters. Using a Wasserstein distance kernel with a variety of clustering algorithms, we found that the 0th-homology Wasserstein distance kernel and spectral clustering yielded clusters with significant differences in whole brain and medial temporal lobe (MTL) volume, thus demonstrating an intrinsic link between whole brain functional topology and brain morphometric structure. These findings demonstrate the importance of MTL in functional connectivity and the efficacy of using TDA-based machine learning methods in network neuroscience and neurodegenerative disease subtyping.
{"title":"Topology-based Clustering of Functional Brain Networks in an Alzheimer's Disease Cohort.","authors":"Frederick H Xu, Michael Gao, Jiong Chen, Sumita Garai, Duy Anh Duong-Tran, Yize Zhao, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease is a progressive neurodegenerative disease with many identifying biomarkers for diagnosis. However, whole-brain phenomena, particularly in functional MRI modalities, are not fully understood nor characterized. Here we employ the novel application of topological data analysis (TDA)-based methods of persistent homology to functional brain networks from ADNI-3 cohort to perform a subtyping experiment using unsupervised clustering techniques. We then investigate variations in QT-PAD challenge features across the identified clusters. Using a Wasserstein distance kernel with a variety of clustering algorithms, we found that the 0<sup>th</sup>-homology Wasserstein distance kernel and spectral clustering yielded clusters with significant differences in whole brain and medial temporal lobe (MTL) volume, thus demonstrating an intrinsic link between whole brain functional topology and brain morphometric structure. These findings demonstrate the importance of MTL in functional connectivity and the efficacy of using TDA-based machine learning methods in network neuroscience and neurodegenerative disease subtyping.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"449-458"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja
Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.
{"title":"A Study of Biomedical Relation Extraction Using GPT Models.","authors":"Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"391-400"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SNOMED CT is the most comprehensive clinical terminology employed worldwide and enhancing its accuracy is of utmost importance. In this work, we introduce an automated approach to identifying erroneous IS-A relations in SNOMED CT. We first extract linked concept-pairs from which we generate Term Difference Pairs (TDPs) that contain differences between the concepts. Given a TDP, if the reversed TDP also exists and the number of linked-pairs generating this TDP is less than those generating the reversed TDP, then we suggest the former linked-pairs as potentially erroneous IS-A relations. We applied this approach to the Clinical finding and Procedure subhierarchies of the 2022 March US Edition of SNOMED CT, and obtained 52 potentially erroneous IS-A relations and a candidate list of 48 linked-pairs. A domain expert confirmed 41 out of 52 (78.8%) are valid and identified 26 erroneous IS-A relations out of 48 linked-pairs demonstrating the effectiveness of the approach.
{"title":"An Automated Approach for Identifying Erroneous IS-A Relations in SNOMED CT.","authors":"Ran Hu, Jay Shi, Licong Cui, Rashmie Abeysinghe","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>SNOMED CT is the most comprehensive clinical terminology employed worldwide and enhancing its accuracy is of utmost importance. In this work, we introduce an automated approach to identifying erroneous IS-A relations in SNOMED CT. We first extract linked concept-pairs from which we generate Term Difference Pairs (TDPs) that contain differences between the concepts. Given a TDP, if the reversed TDP also exists and the number of linked-pairs generating this TDP is less than those generating the reversed TDP, then we suggest the former linked-pairs as potentially erroneous IS-A relations. We applied this approach to the Clinical finding and Procedure subhierarchies of the 2022 March US Edition of SNOMED CT, and obtained 52 potentially erroneous IS-A relations and a candidate list of 48 linked-pairs. A domain expert confirmed 41 out of 52 (78.8%) are valid and identified 26 erroneous IS-A relations out of 48 linked-pairs demonstrating the effectiveness of the approach.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"545-554"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning "hydroxychloroquine" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.
{"title":"Enabling Semantic Topic Modeling on Twitter Using MetaMap.","authors":"Rebecca Shyu, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning \"hydroxychloroquine\" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"670-678"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.
{"title":"Cross-phenotype associations between Alzheimer's Disease and its comorbidities may provide clues to progression.","authors":"Anni Moore, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"623-631"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}