The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present READI, a utility preserving framework for the unstructured document de-identification. READI leverages Named Entity Recognition and Relation Extraction technology to improve the quality of the entity detection, thus improving the overall quality of the data de-identification process. In this proof of concept study, we evaluate the proposed approach on two different datasets and compare with the existing state-of-the-art approaches. We show that Relation Extraction-based Approach for De-Identification (READI) notably reduces the number of false positives and improves the utility of the de-identified text.
{"title":"Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering.","authors":"Liubov Nedoshivina, Anisa Halimi, Joao Bettencourt-Silva, Stefano Braghin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present READI, a utility preserving framework for the unstructured document de-identification. READI leverages Named Entity Recognition and Relation Extraction technology to improve the quality of the entity detection, thus improving the overall quality of the data de-identification process. In this proof of concept study, we evaluate the proposed approach on two different datasets and compare with the existing state-of-the-art approaches. We show that Relation Extraction-based Approach for De-Identification (READI) notably reduces the number of false positives and improves the utility of the de-identified text.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome-wide association studies (GWAS) remain a popular method for identifying novel genetic associations with human phenotypes and have provided many insights into the etiology of many diseases. However, GWAS provide limited support for how a genetic association might contribute to disease due to inherent limitations, such as linkage disequilibrium. As such, many methods that operate on GWAS summary statistics have been developed to generate evidence for functional pathways or for variants of interest, but they require defining the genomic region bounds for loci of interest. At present, there are limited methods for determining these bounds in a rigorous, reproducible way. We present a novel statistical method, Statistical Analysis for Bayesian Estimation of Regions (SABER), that uses Bayesian Gaussian mixture models to reproducibly generate ratios that quantify whether particular genomic positions represent the bounds of loci of interest and can be used to delineate genomic regions for downstream analyses.
{"title":"SABER: Statistical Identification of Loci of Interest in GWAS Summary Statistics using a Bayesian Gaussian Mixture Model.","authors":"Rachit Kumar, Rasika Venkatesh, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Genome-wide association studies (GWAS) remain a popular method for identifying novel genetic associations with human phenotypes and have provided many insights into the etiology of many diseases. However, GWAS provide limited support for how a genetic association might contribute to disease due to inherent limitations, such as linkage disequilibrium. As such, many methods that operate on GWAS summary statistics have been developed to generate evidence for functional pathways or for variants of interest, but they require defining the genomic region bounds for loci of interest. At present, there are limited methods for determining these bounds in a rigorous, reproducible way. We present a novel statistical method, Statistical Analysis for Bayesian Estimation of Regions (SABER), that uses Bayesian Gaussian mixture models to reproducibly generate ratios that quantify whether particular genomic positions represent the bounds of loci of interest and can be used to delineate genomic regions for downstream analyses.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141805/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frederick H Xu, Michael Gao, Jiong Chen, Sumita Garai, Duy Anh Duong-Tran, Yize Zhao, Li Shen
Alzheimer's disease is a progressive neurodegenerative disease with many identifying biomarkers for diagnosis. However, whole-brain phenomena, particularly in functional MRI modalities, are not fully understood nor characterized. Here we employ the novel application of topological data analysis (TDA)-based methods of persistent homology to functional brain networks from ADNI-3 cohort to perform a subtyping experiment using unsupervised clustering techniques. We then investigate variations in QT-PAD challenge features across the identified clusters. Using a Wasserstein distance kernel with a variety of clustering algorithms, we found that the 0th-homology Wasserstein distance kernel and spectral clustering yielded clusters with significant differences in whole brain and medial temporal lobe (MTL) volume, thus demonstrating an intrinsic link between whole brain functional topology and brain morphometric structure. These findings demonstrate the importance of MTL in functional connectivity and the efficacy of using TDA-based machine learning methods in network neuroscience and neurodegenerative disease subtyping.
{"title":"Topology-based Clustering of Functional Brain Networks in an Alzheimer's Disease Cohort.","authors":"Frederick H Xu, Michael Gao, Jiong Chen, Sumita Garai, Duy Anh Duong-Tran, Yize Zhao, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease is a progressive neurodegenerative disease with many identifying biomarkers for diagnosis. However, whole-brain phenomena, particularly in functional MRI modalities, are not fully understood nor characterized. Here we employ the novel application of topological data analysis (TDA)-based methods of persistent homology to functional brain networks from ADNI-3 cohort to perform a subtyping experiment using unsupervised clustering techniques. We then investigate variations in QT-PAD challenge features across the identified clusters. Using a Wasserstein distance kernel with a variety of clustering algorithms, we found that the 0<sup>th</sup>-homology Wasserstein distance kernel and spectral clustering yielded clusters with significant differences in whole brain and medial temporal lobe (MTL) volume, thus demonstrating an intrinsic link between whole brain functional topology and brain morphometric structure. These findings demonstrate the importance of MTL in functional connectivity and the efficacy of using TDA-based machine learning methods in network neuroscience and neurodegenerative disease subtyping.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiayuan Huang, Ross Kleiman, David Page, Scott Hebbring
We recently demonstrated that electronically constructed family pedigrees (e-pedigrees) have great value in epidemiologic research using electronic health record (EHR) data. Prior to this work, it has been well accepted that family health history is a major predictor for a wide spectrum of diseases, reflecting shared effects of genetics, environment, and lifestyle. With the widespread digitalization of patient data via EHRs, there is an unprecedented opportunity to use machine learning algorithms to better predict disease risk. Although predictive models have previously been constructed for a few important diseases, we currently know very little about how accurately the risk for most diseases can be predicted. It is further unknown if the incorporation of e-pedigrees in machine learning can improve the value of these models. In this study, we devised a family pedigree-driven high-throughput machine learning pipeline to simultaneously predict risks for thousands of diagnosis codes using thousands of input features. Models were built to predict future disease risk for three time windows using both Logistic Regression and XGBoost. For example, we achieved average areas under the receiver operating characteristic curves (AUCs) of 0.82, 0.77 and 0.71 for 1, 6, and 24 months, respectively using XGBoost and without e-pedigrees. When adding e-pedigree features to the XGBoost pipeline, AUCs increased to 0.83, 0.79 and 0.74 for the same three time periods, respectively. E-pedigrees similarly improved the predictions when using Logistic Regression. These results emphasize the potential value of incorporating family health history via e-pedigrees into machine learning with no further human time.
{"title":"Automated Family Histories Significantly Improve Risk Prediction in an EHR.","authors":"Xiayuan Huang, Ross Kleiman, David Page, Scott Hebbring","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We recently demonstrated that electronically constructed family pedigrees (e-pedigrees) have great value in epidemiologic research using electronic health record (EHR) data. Prior to this work, it has been well accepted that family health history is a major predictor for a wide spectrum of diseases, reflecting shared effects of genetics, environment, and lifestyle. With the widespread digitalization of patient data via EHRs, there is an unprecedented opportunity to use machine learning algorithms to better predict disease risk. Although predictive models have previously been constructed for a few important diseases, we currently know very little about how accurately the risk for most diseases can be predicted. It is further unknown if the incorporation of e-pedigrees in machine learning can improve the value of these models. In this study, we devised a family pedigree-driven high-throughput machine learning pipeline to simultaneously predict risks for thousands of diagnosis codes using thousands of input features. Models were built to predict future disease risk for three time windows using both Logistic Regression and XGBoost. For example, we achieved average areas under the receiver operating characteristic curves (AUCs) of 0.82, 0.77 and 0.71 for 1, 6, and 24 months, respectively using XGBoost and without e-pedigrees. When adding e-pedigree features to the XGBoost pipeline, AUCs increased to 0.83, 0.79 and 0.74 for the same three time periods, respectively. E-pedigrees similarly improved the predictions when using Logistic Regression. These results emphasize the potential value of incorporating family health history via e-pedigrees into machine learning with no further human time.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141855/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja
Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.
{"title":"A Study of Biomedical Relation Extraction Using GPT Models.","authors":"Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SNOMED CT is the most comprehensive clinical terminology employed worldwide and enhancing its accuracy is of utmost importance. In this work, we introduce an automated approach to identifying erroneous IS-A relations in SNOMED CT. We first extract linked concept-pairs from which we generate Term Difference Pairs (TDPs) that contain differences between the concepts. Given a TDP, if the reversed TDP also exists and the number of linked-pairs generating this TDP is less than those generating the reversed TDP, then we suggest the former linked-pairs as potentially erroneous IS-A relations. We applied this approach to the Clinical finding and Procedure subhierarchies of the 2022 March US Edition of SNOMED CT, and obtained 52 potentially erroneous IS-A relations and a candidate list of 48 linked-pairs. A domain expert confirmed 41 out of 52 (78.8%) are valid and identified 26 erroneous IS-A relations out of 48 linked-pairs demonstrating the effectiveness of the approach.
{"title":"An Automated Approach for Identifying Erroneous IS-A Relations in SNOMED CT.","authors":"Ran Hu, Jay Shi, Licong Cui, Rashmie Abeysinghe","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>SNOMED CT is the most comprehensive clinical terminology employed worldwide and enhancing its accuracy is of utmost importance. In this work, we introduce an automated approach to identifying erroneous IS-A relations in SNOMED CT. We first extract linked concept-pairs from which we generate Term Difference Pairs (TDPs) that contain differences between the concepts. Given a TDP, if the reversed TDP also exists and the number of linked-pairs generating this TDP is less than those generating the reversed TDP, then we suggest the former linked-pairs as potentially erroneous IS-A relations. We applied this approach to the Clinical finding and Procedure subhierarchies of the 2022 March US Edition of SNOMED CT, and obtained 52 potentially erroneous IS-A relations and a candidate list of 48 linked-pairs. A domain expert confirmed 41 out of 52 (78.8%) are valid and identified 26 erroneous IS-A relations out of 48 linked-pairs demonstrating the effectiveness of the approach.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coronary artery calcium (CAC) as assessed by computed tomography (CT) is a marker of subclinical coronary atherosclerosis. However, routine application of CAC scoring via CT is limited by high costs and accessibility. An electrocardiogram (ECG) is a widely-used, sensitive, cost-effective, non-invasive, and radiation-free diagnostic tool. Considering this, if artificial intelligence (AI)-enabled electrocardiograms (ECGs) could opportunistically detect CAC, it would be particularly beneficial for the asymptomatic or subclinical populations, acting as an initial screening measure, paving the way for further confirmatory tests and preventive strategies, a step ahead of conventional practices. With this aim, we developed an AI-enabled ECG framework that not only predicts a CAC score ≥400 but also offers a visual explanation of the associated potential morphological ECG changes, and tested its efficacy on individuals undergoing health checkups, a group primarily comprising healthy or subclinical individuals. To ensure broader applicability, we performed external validation at a separate institution.
{"title":"An Explainable Artificial Intelligence-enabled ECG Framework for the Prediction of Subclinical Coronary Atherosclerosis.","authors":"Changho Han, Dukyong Yoon","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Coronary artery calcium (CAC) as assessed by computed tomography (CT) is a marker of subclinical coronary atherosclerosis. However, routine application of CAC scoring via CT is limited by high costs and accessibility. An electrocardiogram (ECG) is a widely-used, sensitive, cost-effective, non-invasive, and radiation-free diagnostic tool. Considering this, if artificial intelligence (AI)-enabled electrocardiograms (ECGs) could opportunistically detect CAC, it would be particularly beneficial for the asymptomatic or subclinical populations, acting as an initial screening measure, paving the way for further confirmatory tests and preventive strategies, a step ahead of conventional practices. With this aim, we developed an AI-enabled ECG framework that not only predicts a CAC score ≥400 but also offers a visual explanation of the associated potential morphological ECG changes, and tested its efficacy on individuals undergoing health checkups, a group primarily comprising healthy or subclinical individuals. To ensure broader applicability, we performed external validation at a separate institution.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141849/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abheet Singh Sachdeva, Avery Bell, Dr Jacob Furst, Dorothy A Kozlowski, Sonya Crabtree-Nelson, Daniela Raicu
Research studies have presented an unappreciated relationship between intimate partner violence (IPV) survivors and symptoms of traumatic brain injuries (TBI). Within these IPV survivors, resulting TBIs are not always identified during emergency room visits. This demonstrates a need for a prescreening tool that identifies IPV survivors who should receive TBI screening. We present a model that measures similarities to clinical reports for confirmed TBI cases to identify whether a patient should be screened for TBI. This is done through an ensemble of three supervised learning classifiers which work in two distinct feature spaces. Individual classifiers are trained on clinical reports and then used to create an ensemble that needs only one positive label to indicate a patient should be screened for TBI.
{"title":"A Traumatic Brain Injury Prescreening Tool for Intimate Partner Violence Patients Using Initial Clinical Reports and Machine Learning.","authors":"Abheet Singh Sachdeva, Avery Bell, Dr Jacob Furst, Dorothy A Kozlowski, Sonya Crabtree-Nelson, Daniela Raicu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Research studies have presented an unappreciated relationship between intimate partner violence (IPV) survivors and symptoms of traumatic brain injuries (TBI). Within these IPV survivors, resulting TBIs are not always identified during emergency room visits. This demonstrates a need for a prescreening tool that identifies IPV survivors who should receive TBI screening. We present a model that measures similarities to clinical reports for confirmed TBI cases to identify whether a patient should be screened for TBI. This is done through an ensemble of three supervised learning classifiers which work in two distinct feature spaces. Individual classifiers are trained on clinical reports and then used to create an ensemble that needs only one positive label to indicate a patient should be screened for TBI.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care.
{"title":"Aiming for Relevance.","authors":"Bar Eini-Porat, Danny Eytan, Uri Shalit","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weishen Pan, Chang Su, Jacqueline R M A Maasch, Kun Chen, Claire Henchcliffe, Fei Wang
Parkinson's disease (PD) is associated with multiple clinical motor and non-motor manifestations. Understanding of PD etiologies has been informed by a growing number of genetic mutations and various fluid-based and brain imaging biomarkers. However, the mechanisms underlying its varied phenotypic features remain elusive. The present work introduces a data-driven approach for generating phenotypic association graphs for PD cohorts. Data collected by the Parkinson's Progression Markers Initiative (PPMI), the Parkinson's Disease Biomarkers Program (PDBP), and the Fox Investigation for New Discovery of Biomarkers (BioFIND) were analyzed by this approach to identify heterogeneous and longitudinal phenotypic associations that may provide insight into the pathology of this complex disease. Findings based on the phenotypic association graphs could improve understanding of longitudinal PD pathologies and how these relate to patient symptomology.
{"title":"Learning Phenotypic Associations for Parkinson's Disease with Longitudinal Clinical Records.","authors":"Weishen Pan, Chang Su, Jacqueline R M A Maasch, Kun Chen, Claire Henchcliffe, Fei Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Parkinson's disease (PD) is associated with multiple clinical motor and non-motor manifestations. Understanding of PD etiologies has been informed by a growing number of genetic mutations and various fluid-based and brain imaging biomarkers. However, the mechanisms underlying its varied phenotypic features remain elusive. The present work introduces a data-driven approach for generating phenotypic association graphs for PD cohorts. Data collected by the Parkinson's Progression Markers Initiative (PPMI), the Parkinson's Disease Biomarkers Program (PDBP), and the Fox Investigation for New Discovery of Biomarkers (BioFIND) were analyzed by this approach to identify heterogeneous and longitudinal phenotypic associations that may provide insight into the pathology of this complex disease. Findings based on the phenotypic association graphs could improve understanding of longitudinal PD pathologies and how these relate to patient symptomology.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}