SNOMED CT is extensively employed to standardize data across diverse patient datasets and support cohort identification, with studies revealing its benefits and challenges. In this work, we developed a SNOMED CT-driven cohort query system over a heterogeneous Optum® de-identified COVID-19 Electronic Health Record dataset leveraging concept mappings between ICD-9-CM/ICD-10-CM and SNOMED CT. We evaluated the benefits and challenges of using SNOMED CT to perform cohort queries based on both query code sets and actual patients retrieved from the database, leveraging the original ICD-9-CM and ICD-10-CM as baselines. Manual review of 80 random cases revealed 65 cases containing 148 true positive codes and 25 cases containing 63 false positive codes. The manual evaluation also revealed issues in code naming, mappings, and hierarchical relations. Overall, our study indicates that while the SNOMED CT-driven query system holds considerable promise for comprehensive cohort queries, careful attention must be given to the challenges offalsely included codes and patients.
{"title":"Leveraging SNOMED CT for patient cohort identification over heterogeneous EHR data.","authors":"Xubing Hao, Yan Huang, Licong Cui, Xiaojin Li","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>SNOMED CT is extensively employed to standardize data across diverse patient datasets and support cohort identification, with studies revealing its benefits and challenges. In this work, we developed a SNOMED CT-driven cohort query system over a heterogeneous Optum<sup>®</sup> de-identified COVID-19 Electronic Health Record dataset leveraging concept mappings between ICD-9-CM/ICD-10-CM and SNOMED CT. We evaluated the benefits and challenges of using SNOMED CT to perform cohort queries based on both query code sets and actual patients retrieved from the database, leveraging the original ICD-9-CM and ICD-10-CM as baselines. Manual review of 80 random cases revealed 65 cases containing 148 true positive codes and 25 cases containing 63 false positive codes. The manual evaluation also revealed issues in code naming, mappings, and hierarchical relations. Overall, our study indicates that while the SNOMED CT-driven query system holds considerable promise for comprehensive cohort queries, careful attention must be given to the challenges offalsely included codes and patients.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"205-214"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehmet F Bagci, Samantha R Spierling, Anna L Ritko, Truong Nguyen, Brian D Modena, Yusuf Ozturk
Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.
{"title":"Enhancing Healthcare Data Integration: A Machine Learning Approach to Harmonizing Laboratory Labels.","authors":"Mehmet F Bagci, Samantha R Spierling, Anna L Ritko, Truong Nguyen, Brian D Modena, Yusuf Ozturk","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"65-73"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dipak P Upadhyaya, Katrina Prantzalos, Pedram Golnari, Aasef G Shaikh, Subhashini Sivagnanam, Amitava Majumdar, Fatema F Ghasia, Satya S Sahoo
Amblyopia is a neurodevelopmental disorder affecting children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye tracking recordings from high fidelity eye tracking instruments performed by specialized pediatric ophthalmologists, often unavailable in rural, low resource clinics. As such, there is an urgent need to develop a scalable, low cost, high accuracy approach to automatically analyze eye tracking recordings. Large Language Models (LLM) show promise in accurate detection of amblyopia; our prior work has shown that the Google Gemini model, guided by expert ophthalmologists, can detect control and amblyopic subjects from eye tracking recordings. However, there is a clear need to address the issues of transparency and trust in medical applications of LLMs. To bolster the reliability and interpretability of LLM analysis of eye tracking records, we developed a Feature Guided Interprative Prompting (FGIP) framework focused on critical clinical features. Using the Google Gemini model, we classify high-fidelity eye-tracking data to detect amblyopia in children and apply the Quantus framework to evaluate the classification results across key metrics (faithfulness, robustness, localization, and complexity). These metrics provide a quantitative basis for understanding the model's decision-making process. This work presents the first implementation of an Explainable Artificial Intelligence (XAI) framework to systematically characterize the results generated by the Gemini model using high-fidelity eye-tracking data to detect amblyopia in children. Results demonstrated that the model accurately classified control and amblyopic subjects, including those with nystagmus while maintaining transparency and clinical alignment. The results of this study support the development of a scalable and interpretable clinical decision support (CDS) tool using LLMs that has the potential to enhance the trustworthiness of AI applications.
弱视是一种影响儿童视力的神经发育障碍,需要早期诊断才能有效治疗。传统的诊断方法依赖于由专业儿科眼科医生使用的高保真眼动追踪仪器对眼动追踪记录的主观评估,这在农村、资源匮乏的诊所往往无法获得。因此,迫切需要开发一种可扩展、低成本、高精度的方法来自动分析眼动追踪记录。大语言模型(Large Language Models, LLM)在弱视的准确检测方面有前景;我们之前的工作表明,在眼科专家的指导下,谷歌双子座模型可以从眼动追踪记录中检测出控制性和弱视受试者。然而,显然需要解决法学硕士在医疗应用中的透明度和信任问题。为了提高眼动记录LLM分析的可靠性和可解释性,我们开发了一个特征引导解释提示(FGIP)框架,重点关注关键临床特征。使用谷歌Gemini模型,我们对高保真眼动追踪数据进行分类,以检测儿童弱视,并应用Quantus框架评估关键指标(忠实度、鲁棒性、本地化和复杂性)的分类结果。这些指标为理解模型的决策过程提供了定量基础。这项工作首次实现了可解释人工智能(XAI)框架,该框架使用高保真眼动追踪数据系统地表征Gemini模型产生的结果,以检测儿童弱视。结果表明,该模型在保持透明度和临床一致性的同时,准确地分类了对照组和弱视受试者,包括眼球震颤患者。这项研究的结果支持使用法学硕士开发可扩展和可解释的临床决策支持(CDS)工具,该工具有可能提高人工智能应用的可信度。
{"title":"Explainable Artificial Intelligence (XAI) in the Era of Large Language Models: Applying an XAI Framework in Pediatric Ophthalmology Diagnosis using the Gemini Model.","authors":"Dipak P Upadhyaya, Katrina Prantzalos, Pedram Golnari, Aasef G Shaikh, Subhashini Sivagnanam, Amitava Majumdar, Fatema F Ghasia, Satya S Sahoo","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Amblyopia is a neurodevelopmental disorder affecting children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye tracking recordings from high fidelity eye tracking instruments performed by specialized pediatric ophthalmologists, often unavailable in rural, low resource clinics. As such, there is an urgent need to develop a scalable, low cost, high accuracy approach to automatically analyze eye tracking recordings. Large Language Models (LLM) show promise in accurate detection of amblyopia; our prior work has shown that the Google Gemini model, guided by expert ophthalmologists, can detect control and amblyopic subjects from eye tracking recordings. However, there is a clear need to address the issues of transparency and trust in medical applications of LLMs. To bolster the reliability and interpretability of LLM analysis of eye tracking records, we developed a Feature Guided Interprative Prompting (FGIP) framework focused on critical clinical features. Using the Google Gemini model, we classify high-fidelity eye-tracking data to detect amblyopia in children and apply the Quantus framework to evaluate the classification results across key metrics (faithfulness, robustness, localization, and complexity). These metrics provide a quantitative basis for understanding the model's decision-making process. This work presents the first implementation of an Explainable Artificial Intelligence (XAI) framework to systematically characterize the results generated by the Gemini model using high-fidelity eye-tracking data to detect amblyopia in children. Results demonstrated that the model accurately classified control and amblyopic subjects, including those with nystagmus while maintaining transparency and clinical alignment. The results of this study support the development of a scalable and interpretable clinical decision support (CDS) tool using LLMs that has the potential to enhance the trustworthiness of AI applications.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"566-575"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150742/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Magnetic Resonance Imaging (MRI) is a crucial diagnostic tool in medicine, widely used to detect and assess various health conditions. Different MRI sequences, such as T1-weighted, T2-weighted, and FLAIR, serve distinct roles by highlighting different tissue characteristics and contrasts. However, distinguishing them based solely on the description file is currently impossible due to confusing or incorrect annotations. Additionally, there is a notable lack of effective tools to differentiate these sequences. In response, we developed a deep learning-based toolkit tailored for small, unrefined MRI datasets. This toolkit enables precise sequence classification and delivers performance comparable to systems trained on large, meticulously curated datasets. Utilizing lightweight model architectures and incorporating a voting ensemble method, the toolkit enhances accuracy and stability. It achieves a 99% accuracy rate using only 10% of the data typically required in other research. The code is available at https://github.com/JinqianPan/MRISeqClassifier.
{"title":"MRISeqClassifier: A Deep Learning Toolkit for Precise MRI Sequence Classification.","authors":"Jinqian Pan, Qi Chen, Chengkun Sun, Renjie Liang, Jiang Bian, Jie Xu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Magnetic Resonance Imaging (MRI) is a crucial diagnostic tool in medicine, widely used to detect and assess various health conditions. Different MRI sequences, such as T1-weighted, T2-weighted, and FLAIR, serve distinct roles by highlighting different tissue characteristics and contrasts. However, distinguishing them based solely on the description file is currently impossible due to confusing or incorrect annotations. Additionally, there is a notable lack of effective tools to differentiate these sequences. In response, we developed a deep learning-based toolkit tailored for small, unrefined MRI datasets. This toolkit enables precise sequence classification and delivers performance comparable to systems trained on large, meticulously curated datasets. Utilizing lightweight model architectures and incorporating a voting ensemble method, the toolkit enhances accuracy and stability. It achieves a 99% accuracy rate using only 10% of the data typically required in other research. The code is available at https://github.com/JinqianPan/MRISeqClassifier.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"405-413"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nattanit Songthangtham, Ratchada Jantraporn, Elizabeth Weinfurter, Gyorgy Simon, Wei Pan, Sripriya Rajamani, Steven G Johnson
Assessing how accurately a cohort extracted from Electronic Health Records (EHR) represents the intended target population, or cohort fitness, is critical but often overlooked in secondary EHR data use. This scoping review aimed to (1) identify guidelines for assessing cohort fitness and (2) determine their thoroughness by examining whether they offer sufficient detail and computable methods for researchers. This scoping review follows the JBI guidance for scoping reviews and is refined based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for scoping reviews (PRISMA-ScR) checklists. Searches were performed in Medline, Embase, and Scopus. From 1,904 results, 30 articles and 2 additional references were reviewed. Nine articles (28.13%) include a framework for evaluating cohort fitness but only 5 (15.63%) contain sufficient details and quantitative methodologies. Overall, a more comprehensive guideline that provides best practices for measuring the cohort fitness is still needed.
{"title":"A Standardized Guideline for Assessing Extracted Electronic Health Records Cohorts: A Scoping Review.","authors":"Nattanit Songthangtham, Ratchada Jantraporn, Elizabeth Weinfurter, Gyorgy Simon, Wei Pan, Sripriya Rajamani, Steven G Johnson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Assessing how accurately a cohort extracted from Electronic Health Records (EHR) represents the intended target population, or cohort fitness, is critical but often overlooked in secondary EHR data use. This scoping review aimed to (1) identify guidelines for assessing cohort fitness and (2) determine their thoroughness by examining whether they offer sufficient detail and computable methods for researchers. This scoping review follows the JBI guidance for scoping reviews and is refined based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for scoping reviews (PRISMA-ScR) checklists. Searches were performed in Medline, Embase, and Scopus. From 1,904 results, 30 articles and 2 additional references were reviewed. Nine articles (28.13%) include a framework for evaluating cohort fitness but only 5 (15.63%) contain sufficient details and quantitative methodologies. Overall, a more comprehensive guideline that provides best practices for measuring the cohort fitness is still needed.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"527-536"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aileen S Gabriel, Te-Yi Tsai, C Mahony Reategui-Rivera, Patricia Rocco, Aref Smiley, Clayton Powers, Jeanette P Brown, Joseph Finkelstein
Postural Tachycardia Syndrome (POTS) is a chronic condition characterized by orthostatic intolerance and a significant rise in heart rate upon standing. Patients often experience debilitating symptoms, such as brain fog and chronic fatigue, which hinder daily functioning. Non-pharmacological management strategies, particularly pacing, are crucial for reducing symptom fluctuations and improving quality of life. Heart rate monitoring plays a key role in effective pacing, enabling patients to plan activities and prevent severe symptom onset. Recent technological advancements have increased interest in wearable devices for managing chronic conditions. This study examines the feasibility of using wearable technology to support symptom management in POTS patients. Through an Exploratory- Descriptive Qualitative approach, five key themes emerged, including personalized management strategies and the beneficial impact of real-time feedback. The findings suggest that wearable devices can enhance self-management, improve communication with healthcare providers, and empower patients to take a more proactive approach to their care.
{"title":"Feasibility Assessment of a Wearable App to Manage Symptoms of Postural Orthostatic Tachycardia Syndrome Using Real-Time Heart Rate Monitoring.","authors":"Aileen S Gabriel, Te-Yi Tsai, C Mahony Reategui-Rivera, Patricia Rocco, Aref Smiley, Clayton Powers, Jeanette P Brown, Joseph Finkelstein","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Postural Tachycardia Syndrome (POTS) is a chronic condition characterized by orthostatic intolerance and a significant rise in heart rate upon standing. Patients often experience debilitating symptoms, such as brain fog and chronic fatigue, which hinder daily functioning. Non-pharmacological management strategies, particularly pacing, are crucial for reducing symptom fluctuations and improving quality of life. Heart rate monitoring plays a key role in effective pacing, enabling patients to plan activities and prevent severe symptom onset. Recent technological advancements have increased interest in wearable devices for managing chronic conditions. This study examines the feasibility of using wearable technology to support symptom management in POTS patients. Through an Exploratory- Descriptive Qualitative approach, five key themes emerged, including personalized management strategies and the beneficial impact of real-time feedback. The findings suggest that wearable devices can enhance self-management, improve communication with healthcare providers, and empower patients to take a more proactive approach to their care.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"159-166"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jenna M Schabdach, Remo M S Williams, Joseph Logan, Viveknarayanan Padmanabhan, Russell D'Aiello Iii, Johnny Mclaughlin, Alexander Gonzalez, Edward M Krause, Gregory E Tasian, Susan Sotardi, Aaron F Alexander-Bloch
Growth in the field of medical imaging research has revealed a need for larger volume and variety in available data. This need could be met using curated clinically acquired data, but the process for getting this data from the scanners to the scientists is complex and lengthy. We present a manifest-driven modular Extract, Transform, and Load (ETL) process named Locutus designed to appropriately handle difficulties present in the process of reusing clinically acquired medical imaging data. The design of Locutus was based on four foundational assumptions about medical data, research data, and communication. All parts of a workflow must communicate with each other and be adaptable to unique data delivery requests. In addition, the workflow must be robust to possible errors and uncertainties in clinically-acquired data, which may require human intervention to resolve. With these assumptions in mind,Locutus presents a five-phase workflow for downloading, deidentifying, and delivering unique requests for imaging data. The phases include initialization, data preparation, extraction of data from the research server to a pre-deidentification data warehouse, transformation into deidentified space, and loading into post-deidentification data warehouse. To date, this workflow has been used to process 32,962 imaging accessions for research use. This number is expected to grow as technical challenges are addressed and the role of humans is expected to shift from frequent intervention to regular monitoring.
{"title":"From Scanner to Science: Reusing Clinically Acquired Medical Images for Research.","authors":"Jenna M Schabdach, Remo M S Williams, Joseph Logan, Viveknarayanan Padmanabhan, Russell D'Aiello Iii, Johnny Mclaughlin, Alexander Gonzalez, Edward M Krause, Gregory E Tasian, Susan Sotardi, Aaron F Alexander-Bloch","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Growth in the field of medical imaging research has revealed a need for larger volume and variety in available data. This need could be met using curated clinically acquired data, but the process for getting this data from the scanners to the scientists is complex and lengthy. We present a manifest-driven modular Extract, Transform, and Load (ETL) process named Locutus designed to appropriately handle difficulties present in the process of reusing clinically acquired medical imaging data. The design of Locutus was based on four foundational assumptions about medical data, research data, and communication. All parts of a workflow must communicate with each other and be adaptable to unique data delivery requests. In addition, the workflow must be robust to possible errors and uncertainties in clinically-acquired data, which may require human intervention to resolve. With these assumptions in mind,Locutus presents a five-phase workflow for downloading, deidentifying, and delivering unique requests for imaging data. The phases include initialization, data preparation, extraction of data from the research server to a pre-deidentification data warehouse, transformation into deidentified space, and loading into post-deidentification data warehouse. To date, this workflow has been used to process 32,962 imaging accessions for research use. This number is expected to grow as technical challenges are addressed and the role of humans is expected to shift from frequent intervention to regular monitoring.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"471-480"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Paredes, Sankalp Talankar, Cheng Peng, Patrick Balian, Motomoti Lewis, Shunhun Yan, Wen-Shan Tsai PharmD, Ching-Yuan Chang, Debbie L Wilson, Wei-Hsuan Lo-Ciganic, Yonghui Wu
Opioid overdose and opioid use disorder (OUD) remain a growing public health issue in the United States, affecting 6.1 million individuals in 2022, more than doubling the 2.5 million from 2021. Accurately identifying the opioid overdose and OUD related information is critical to study the outcomes and develop interventions. This study aims to identify opioid overdose and OUD mentions and their related information from clinical narratives. We compared encoder-based large language models (LLMs) and decoder-based generative LLMs in extracting nine crucial concepts related with opioid overdose and OUD including problematic opioid use. Through a cost-effective p-tuning algorithm, our decoder-based generative LLM, GatorTronGPT, achieved the best strict/lenient F1-score of 0.8637, and 0.9057, demonstrating the efficient of using generative LLMs for opioid overdose/OUD related information extraction. This study provided a tool to systematically extract opioid overdose, OUD, and their related information to facilitate opioid-related studies using clinical narratives.
{"title":"Identifying Opioid Overdose and Opioid Use Disorder and Related Information from Clinical Narratives Using Large Language Models.","authors":"Daniel Paredes, Sankalp Talankar, Cheng Peng, Patrick Balian, Motomoti Lewis, Shunhun Yan, Wen-Shan Tsai PharmD, Ching-Yuan Chang, Debbie L Wilson, Wei-Hsuan Lo-Ciganic, Yonghui Wu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Opioid overdose and opioid use disorder (OUD) remain a growing public health issue in the United States, affecting 6.1 million individuals in 2022, more than doubling the 2.5 million from 2021. Accurately identifying the opioid overdose and OUD related information is critical to study the outcomes and develop interventions. This study aims to identify opioid overdose and OUD mentions and their related information from clinical narratives. We compared encoder-based large language models (LLMs) and decoder-based generative LLMs in extracting nine crucial concepts related with opioid overdose and OUD including problematic opioid use. Through a cost-effective p-tuning algorithm, our decoder-based generative LLM, GatorTronGPT, achieved the best strict/lenient F1-score of 0.8637, and 0.9057, demonstrating the efficient of using generative LLMs for opioid overdose/OUD related information extraction. This study provided a tool to systematically extract opioid overdose, OUD, and their related information to facilitate opioid-related studies using clinical narratives.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"414-421"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alisa Stolyar, Jamie Katz, Catherine Dymowski, Tierney Lyons, Aravind Parthasarathy, Hari Bharadwaj, Elaine Mormer, Catherine Palmer, Yanshan Wang
Hearing loss is a prevalent and impactful condition that affects millions globally. In 2022, the U.S. Food and Drug Administration (FDA) approved over-the-counter (OTC) hearing aids for individuals with mild to moderate hearing loss, establishing a distinct category separate from prescription hearing aids. This regulatory change may leave some patients, particularly those unfamiliar with hearing aids, without medical guidance in their decision-making process. To address this, our team developed the CLEARdashboard (Consumer Led Evidence - Amplification Resources dashboard) as an educational platform to assist users in comparing the technical specifications of various OTC hearing aids. In this study, we proposed a new key feature on the CLEARdashboard that is to utilize Natural Language Processing (NLP) methods to analyze product reviews from two prominent hearing aid online retailers. Analyzing product reviews using NLP is particularly helpful because these reviews often contain detailed, real-world insights into the performance and usability of hearing aids that may not be captured in technical specifications alone. We used NLP techniques in the automatic summarization of large volumes of user feedback into concise "pros and cons" lists, providing patients with a clearer understanding of the strengths and limitations of each device. This approach saves patients from manually sifting through extensive reviews and helps them make informed choices based on aggregated consumer experiences. The generated summaries were validated by three human evaluators to ensure the most comprehensive and reliable method of presenting this information, enhancing the decision-making process for individuals selecting OTC hearing aids.
{"title":"Content Analysis of Over-the-Counter Hearing Aid Reviews.","authors":"Alisa Stolyar, Jamie Katz, Catherine Dymowski, Tierney Lyons, Aravind Parthasarathy, Hari Bharadwaj, Elaine Mormer, Catherine Palmer, Yanshan Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Hearing loss is a prevalent and impactful condition that affects millions globally. In 2022, the U.S. Food and Drug Administration (FDA) approved over-the-counter (OTC) hearing aids for individuals with mild to moderate hearing loss, establishing a distinct category separate from prescription hearing aids. This regulatory change may leave some patients, particularly those unfamiliar with hearing aids, without medical guidance in their decision-making process. To address this, our team developed the CLEARdashboard (Consumer Led Evidence - Amplification Resources dashboard) as an educational platform to assist users in comparing the technical specifications of various OTC hearing aids. In this study, we proposed a new key feature on the CLEARdashboard that is to utilize Natural Language Processing (NLP) methods to analyze product reviews from two prominent hearing aid online retailers. Analyzing product reviews using NLP is particularly helpful because these reviews often contain detailed, real-world insights into the performance and usability of hearing aids that may not be captured in technical specifications alone. We used NLP techniques in the automatic summarization of large volumes of user feedback into concise \"pros and cons\" lists, providing patients with a clearer understanding of the strengths and limitations of each device. This approach saves patients from manually sifting through extensive reviews and helps them make informed choices based on aggregated consumer experiences. The generated summaries were validated by three human evaluators to ensure the most comprehensive and reliable method of presenting this information, enhancing the decision-making process for individuals selecting OTC hearing aids.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"546-555"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rare diseases affect approximately 1 in 11 Americans, yet their diagnosis remains challenging due to limited clinical evidence, low awareness, and lack of definitive treatments. Our project aims to accelerate rare disease diagnosis by developing a comprehensive informatics framework leveraging data mining, semantic web technologies, deep learning, and graph-based embedding techniques. However, our on-premises computational infrastructure faces significant challenges in scalability, maintenance, and collaboration. This study focuses on developing and evaluating a cloud-based computing infrastructure to address these challenges. By migrating to a scalable, secure, and collaborative cloud environment, we aim to enhance data integration, support advanced predictive modeling for differential diagnoses, and facilitate widespread dissemination of research findings to stakeholders, the research community, and the public and also proposed a facilitated through a reliable, standardized workflow designed to ensure minimal disruption and maintain data integrity for existing research project.
{"title":"Empowering Precision Medicine for Rare Diseases through Cloud Infrastructure Refactoring.","authors":"Hui Li, Jinlian Wang, Hongfang Liu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Rare diseases affect approximately 1 in 11 Americans, yet their diagnosis remains challenging due to limited clinical evidence, low awareness, and lack of definitive treatments. Our project aims to accelerate rare disease diagnosis by developing a comprehensive informatics framework leveraging data mining, semantic web technologies, deep learning, and graph-based embedding techniques. However, our on-premises computational infrastructure faces significant challenges in scalability, maintenance, and collaboration. This study focuses on developing and evaluating a cloud-based computing infrastructure to address these challenges. By migrating to a scalable, secure, and collaborative cloud environment, we aim to enhance data integration, support advanced predictive modeling for differential diagnoses, and facilitate widespread dissemination of research findings to stakeholders, the research community, and the public and also proposed a facilitated through a reliable, standardized workflow designed to ensure minimal disruption and maintain data integrity for existing research project.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"300-311"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}