Uday Suresh, Jessica S Ancker, Brian J Zikmund-Fisher, Natalie C Benda
Communicating health-related probabilities to patients and the public presents challenges, although multiple studies have demonstrated that we can promote comprehension and appropriate application of numbers by matching presentation formats (e.g., percentage, bar charts, icon arrays) to communication goal (e.g., improving recall, decreasing worry, taking action). We used this literature to create goal-driven, evidence-based guidance to support health communicators in conveying probabilities. We then conducted semi-structured interviews with 39 health communicators to understand: communicators' goals for expressing probabilities, formats they choose to convey probabilities, and perceptions of prototypes of our "communicating numbers clearly" guidance. We found that communicators struggled to articulate granular goals for their communication, impeding their ability to select appropriate guidance. Future work should consider how best to support health communicators in selecting granular, differentiable goals to support broadly comprehensible information design.
{"title":"Designing Support to help Health Communication Professionals Convey Numbers Clearly to the Public - A Needs Assessment and Formative Usability Evaluation.","authors":"Uday Suresh, Jessica S Ancker, Brian J Zikmund-Fisher, Natalie C Benda","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Communicating health-related probabilities to patients and the public presents challenges, although multiple studies have demonstrated that we can promote comprehension and appropriate application of numbers by matching presentation formats (e.g., percentage, bar charts, icon arrays) to communication goal (e.g., improving recall, decreasing worry, taking action). We used this literature to create goal-driven, evidence-based guidance to support health communicators in conveying probabilities. We then conducted semi-structured interviews with 39 health communicators to understand: communicators' goals for expressing probabilities, formats they choose to convey probabilities, and perceptions of prototypes of our \"communicating numbers clearly\" guidance. We found that communicators struggled to articulate granular goals for their communication, impeding their ability to select appropriate guidance. Future work should consider how best to support health communicators in selecting granular, differentiable goals to support broadly comprehensible information design.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"1277-1286"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785911/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Randomized Clinical Trials (RCTs) measure an intervention's efficacy, but they may not be generalizable to a desired target population if the RCT is not equitable. Thus, representativeness of RCTs has become a national priority. Synthetic Controls (SCs) that incorporate observational data into RCTs have shown great potential to produce more efficient studies, but their equity is rarely considered. Here, we examine how to improve treatment effect estimation and equity of a trial by augmenting "on-trial" concurrent controls with SCs to form a Hybrid Control Arm (HCA). We introduce FRESCA - a framework to evaluate HCA construction methods using RCT simulations. FRESCA shows that doing propensity and equity adjustment when constructing the HCA leads to accurate population treatment effect estimates while meeting equity goals with potentially less "on-trial" patients. This work represents the first investigation of equity in HCA design that provides definitions, metrics, compelling questions, and resources for future work.
{"title":"Framework for Research in Equitable Synthetic Control Arms.","authors":"Naffs Neehal, Vibha Anand, Kristin P Bennett","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Randomized Clinical Trials (RCTs) measure an intervention's efficacy, but they may not be generalizable to a desired target population if the RCT is not equitable. Thus, representativeness of RCTs has become a national priority. Synthetic Controls (SCs) that incorporate observational data into RCTs have shown great potential to produce more efficient studies, but their equity is rarely considered. Here, we examine how to improve treatment effect estimation and equity of a trial by augmenting \"on-trial\" concurrent controls with SCs to form a Hybrid Control Arm (HCA). We introduce FRESCA - a framework to evaluate HCA construction methods using RCT simulations. FRESCA shows that doing propensity and equity adjustment when constructing the HCA leads to accurate population treatment effect estimates while meeting equity goals with potentially less \"on-trial\" patients. This work represents the first investigation of equity in HCA design that provides definitions, metrics, compelling questions, and resources for future work.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"530-539"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelly J Craig, Yanrong Jerry Ji, Yuxin Chloe Zhang, Alexandra Berk, Amanda Zaleski, Omar Abdelsamad, Henriette Coetzer, Dorothea J Verbrugge, Guangying Hua
Enhancing diversity and inclusion in clinical trial recruitment, especially for historically marginalized populations including Black, Indigenous, and People of Color individuals, is essential. This practice ensures that generalizable trial results are achieved to deliver safe, effective, and equitable health and healthcare. However, recruitment is limited by two inextricably linked barriers - the inability to recruit and retain enough trial participants, and the lack of diversity amongst trial populations whereby racial and ethnic groups are underrepresented when compared to national composition. To overcome these barriers, this study describes and evaluates a framework that combines 1) probabilistic and machine learning models to accurately impute missing race and ethnicity fields in real-world data including medical and pharmacy claims for the identification of eligible trial participants, 2) randomized controlled trial experimentation to deliver an optimal patient outreach strategy, and 3) stratified sampling techniques to effectively balance cohorts to continuously improve engagement and recruitment metrics.
{"title":"Real-world Application of Racial and Ethnic Imputation and Cohort Balancing Techniques to Deliver Equitable Clinical Trial Recruitment.","authors":"Kelly J Craig, Yanrong Jerry Ji, Yuxin Chloe Zhang, Alexandra Berk, Amanda Zaleski, Omar Abdelsamad, Henriette Coetzer, Dorothea J Verbrugge, Guangying Hua","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Enhancing diversity and inclusion in clinical trial recruitment, especially for historically marginalized populations including Black, Indigenous, and People of Color individuals, is essential. This practice ensures that generalizable trial results are achieved to deliver safe, effective, and equitable health and healthcare. However, recruitment is limited by two inextricably linked barriers - the inability to recruit and retain enough trial participants, and the lack of diversity amongst trial populations whereby racial and ethnic groups are underrepresented when compared to national composition. To overcome these barriers, this study describes and evaluates a framework that combines 1) probabilistic and machine learning models to accurately impute missing race and ethnicity fields in real-world data including medical and pharmacy claims for the identification of eligible trial participants, 2) randomized controlled trial experimentation to deliver an optimal patient outreach strategy, and 3) stratified sampling techniques to effectively balance cohorts to continuously improve engagement and recruitment metrics.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"319-328"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785904/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Breast cancer is the second leading cause of cancer death for women in the United States. While breast cancer screening participation is the most effective method for early detection, screening rate has remained low. Given that understanding health perception is critical to understand health decisions, our study utilized the Health Belief Model-based deep learning method to predict and examine public health beliefs in breast cancer and its screening behavior. The results showed that the trends in public health perception are sensitive to political (i.e., changes in health policy), sociological (i.e., representation of disease and its preventive care by public figure or organization), psychological (i.e., social support), and environmental factors (i.e., COVID-19 pandemic). Our study explores the roles social media can play in public health surveillance and in public health promotion of preventive care.
{"title":"Use of Health Belief Model-based Deep Learning to Understand Public Health Beliefs in Breast Cancer Screening from Social Media before and during the COVID-19 Pandemic.","authors":"Michelle Bak, Chieh-Li Chin, Jessie Chin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Breast cancer is the second leading cause of cancer death for women in the United States. While breast cancer screening participation is the most effective method for early detection, screening rate has remained low. Given that understanding health perception is critical to understand health decisions, our study utilized the Health Belief Model-based deep learning method to predict and examine public health beliefs in breast cancer and its screening behavior. The results showed that the trends in public health perception are sensitive to political (i.e., changes in health policy), sociological (i.e., representation of disease and its preventive care by public figure or organization), psychological (i.e., social support), and environmental factors (i.e., COVID-19 pandemic). Our study explores the roles social media can play in public health surveillance and in public health promotion of preventive care.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"280-288"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785880/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Honor S Magon, Daniel Helkey, Tait Shanafelt, Daniel Tawfik
Physicians spend a large amount of time with the electronic health record (EHR), which the majority believe contributes to their burnout. However, there are limitedstandardized measures of physician EHR time. Vendor-derived metrics are standardized but may underestimate real-world EHR experience. Investigator-derived metrics may be more reliable but not standardized, particularly with regard to timeout thresholds defining inactivity. This study aimed to enable standardized investigator-derived metrics using conversion factors between raw event log-derived metrics and Signal (Epic System's standardized metric) for primary care physicians. This was an observational, retrospective longitudinal study of EHR raw event logs and Signal data from a quaternary academic medical center and its community affiliates in California, over a 6-month period. The study evaluated 242 physicians over 1370 physician-months, comparing 53.7 million event logs to 6850 Signal metrics, in five different time based metrics. Results show that inactivity thresholds for event log metric derivation that most closely approximate Signal metrics ranged from 90 seconds (Visit Navigator) to 360 seconds ("Pajama time") depending on the metric. Based on this data, conversion factors for investigator-derived metrics across a wide range of inactivity thresholds, via comparison with Signal metrics, are provided which may allow researchers to consistently quantify EHR experience.
医生在电子病历(EHR)上花费了大量时间,大多数医生认为这导致了他们的职业倦怠。然而,对医生使用电子病历时间的标准化衡量标准有限。供应商提供的指标是标准化的,但可能会低估真实世界的电子病历使用经验。研究人员得出的指标可能更可靠,但并不标准化,尤其是在定义不活动的超时阈值方面。本研究旨在使用原始事件日志衍生指标与 Signal(Epic 系统的标准化指标)之间的转换系数,为全科医生提供标准化的研究人员衍生指标。这是一项观察性、回顾性纵向研究,研究对象是加利福尼亚州一家四级学术医疗中心及其社区附属医院在 6 个月内的 EHR 原始事件日志和 Signal 数据。该研究评估了 242 名医生 1370 个医生月的情况,比较了 5370 万个事件日志和 6850 个 Signal 指标,其中有五个不同的时间指标。结果显示,最接近 Signal 指标的事件日志指标推导的非活动阈值从 90 秒(访问导航仪)到 360 秒("睡衣时间")不等,具体取决于指标。根据这些数据,通过与 Signal 指标的比较,提供了研究人员在广泛的不活动阈值范围内衍生指标的换算系数,从而使研究人员能够一致地量化电子健康记录体验。
{"title":"Creating Conversion Factors from EHR Event Log Data: A Comparison of Investigator-Derived and Vendor-Derived Metrics for Primary Care Physicians.","authors":"Honor S Magon, Daniel Helkey, Tait Shanafelt, Daniel Tawfik","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Physicians spend a large amount of time with the electronic health record (EHR), which the majority believe contributes to their burnout. However, there are limitedstandardized measures of physician EHR time. Vendor-derived metrics are standardized but may underestimate real-world EHR experience. Investigator-derived metrics may be more reliable but not standardized, particularly with regard to timeout thresholds defining inactivity. This study aimed to enable standardized investigator-derived metrics using conversion factors between raw event log-derived metrics and Signal (Epic System's standardized metric) for primary care physicians. This was an observational, retrospective longitudinal study of EHR raw event logs and Signal data from a quaternary academic medical center and its community affiliates in California, over a 6-month period. The study evaluated 242 physicians over 1370 physician-months, comparing 53.7 million event logs to 6850 Signal metrics, in five different time based metrics. Results show that inactivity thresholds for event log metric derivation that most closely approximate Signal metrics ranged from 90 seconds (Visit Navigator) to 360 seconds (\"Pajama time\") depending on the metric. Based on this data, conversion factors for investigator-derived metrics across a wide range of inactivity thresholds, via comparison with Signal metrics, are provided which may allow researchers to consistently quantify EHR experience.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"1115-1124"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aref Smiley, Te-Yi Tsai, Aileen Gabriel, Ihor Havrylchuk, Elena Zakashansky, Taulant Xhakli, Xingyue Huo, Wanting Cui, Fatemeh Shah-Mohammadi, Joseph Finkelstein
This study aims to develop machine learning (ML) algorithms to predict exercise exertion levels using physiological parameters collected from wearable devices. Real-time ECG, oxygen saturation, pulse rate, and revolutions per minute (RPM) data were collected at three intensity levels during a 16-minute cycling exercise. Parallel to this, throughout each exercise session, the study subjects' ratings of perceived exertion (RPE) were gathered once per minute. Each 16-minute exercise session was divided into a total of eight 2-minute windows. Each exercise window was labeled as "high exertion," or "low exertion" classes based on the self-reported RPEs. For each window, the gathered ECG data were used to derive the heart rate variability (HRV) features in the temporal and frequency domains. Additionally, each window's averaged RPMs, heart rate, and oxygen saturation levels were calculated to form all the predictive features. The minimum redundancy maximum relevance algorithm was used to choose the best predictive features. Top selected features were then used to assess the accuracy of ten ML classifiers to predict the next window's exertion level. The k-nearest neighbors (KNN) model showed the highest accuracy of 85.7% and the highest F1 score of 83%. An ensemble model showed the highest area under the curve (AUC) of 0.92. The suggested method can be used to automatically track perceived exercise exertion in real-time.
{"title":"Exercise Exertion Level Prediction Using Data from Wearable Physiologic Monitors.","authors":"Aref Smiley, Te-Yi Tsai, Aileen Gabriel, Ihor Havrylchuk, Elena Zakashansky, Taulant Xhakli, Xingyue Huo, Wanting Cui, Fatemeh Shah-Mohammadi, Joseph Finkelstein","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study aims to develop machine learning (ML) algorithms to predict exercise exertion levels using physiological parameters collected from wearable devices. Real-time ECG, oxygen saturation, pulse rate, and revolutions per minute (RPM) data were collected at three intensity levels during a 16-minute cycling exercise. Parallel to this, throughout each exercise session, the study subjects' ratings of perceived exertion (RPE) were gathered once per minute. Each 16-minute exercise session was divided into a total of eight 2-minute windows. Each exercise window was labeled as \"high exertion,\" or \"low exertion\" classes based on the self-reported RPEs. For each window, the gathered ECG data were used to derive the heart rate variability (HRV) features in the temporal and frequency domains. Additionally, each window's averaged RPMs, heart rate, and oxygen saturation levels were calculated to form all the predictive features. The minimum redundancy maximum relevance algorithm was used to choose the best predictive features. Top selected features were then used to assess the accuracy of ten ML classifiers to predict the next window's exertion level. The k-nearest neighbors (KNN) model showed the highest accuracy of 85.7% and the highest F1 score of 83%. An ensemble model showed the highest area under the curve (AUC) of 0.92. The suggested method can be used to automatically track perceived exercise exertion in real-time.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"653-662"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe He, Shubo Tian, Arslan Erdengasileng, Karim Hanna, Yang Gong, Zhan Zhang, Xiao Luo, Mia Liza A Lustria
Viewing laboratory test results is patients' most frequent activity when accessing patient portals, but lab results can be very confusing for patients. Previous research has explored various ways to present lab results, but few have attempted to provide tailored information support based on individual patient's medical context. In this study, we collected and annotated interpretations of textual lab result in 251 health articles about laboratory tests from AHealthyMe.com. Then we evaluated transformer-based language models including BioBERT, ClinicalBERT, RoBERTa, and PubMedBERT for recognizing key terms and their types. Using BioPortal's term search API, we mapped the annotated terms to concepts in major controlled terminologies. Results showed that PubMedBERT achieved the best F1 on both strict and lenient matching criteria. SNOMED CT had the best coverage of the terms, followed by LOINC and ICD-10-CM. This work lays the foundation for enhancing the presentation of lab results in patient portals by providing patients with contextualized interpretations of their lab results and individualized question prompts that they can, in turn, refer to during physician consults.
{"title":"Annotation and Information Extraction of Consumer-Friendly Health Articles for Enhancing Laboratory Test Reporting.","authors":"Zhe He, Shubo Tian, Arslan Erdengasileng, Karim Hanna, Yang Gong, Zhan Zhang, Xiao Luo, Mia Liza A Lustria","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Viewing laboratory test results is patients' most frequent activity when accessing patient portals, but lab results can be very confusing for patients. Previous research has explored various ways to present lab results, but few have attempted to provide tailored information support based on individual patient's medical context. In this study, we collected and annotated interpretations of textual lab result in 251 health articles about laboratory tests from AHealthyMe.com. Then we evaluated transformer-based language models including BioBERT, ClinicalBERT, RoBERTa, and PubMedBERT for recognizing key terms and their types. Using BioPortal's term search API, we mapped the annotated terms to concepts in major controlled terminologies. Results showed that PubMedBERT achieved the best F1 on both strict and lenient matching criteria. SNOMED CT had the best coverage of the terms, followed by LOINC and ICD-10-CM. This work lays the foundation for enhancing the presentation of lab results in patient portals by providing patients with contextualized interpretations of their lab results and individualized question prompts that they can, in turn, refer to during physician consults.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"407-416"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785897/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver J Bear Don't Walk Iv, Adrienne Pichon, Harry Reyes Nieva, Tony Sun, Jaan Altosaar, Karthik Natarajan, Adler Perotte, Peter Tarczy-Hornoch, Dina Demner-Fushman, Noémie Elhadad
Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.
完整而准确的种族和民族(RE)患者信息对于生物医学信息学研究的许多领域都非常重要,例如定义和描述队列、进行质量评估以及识别健康不公平现象。患者级别的 RE 数据在结构化数据源中往往不准确或缺失,但可以通过临床笔记和自然语言处理 (NLP) 得到补充。近年来,NLP 在大型语言模型方面取得了许多进步,但偏见仍是一个经常未得到解决的问题,研究表明,对某些种族/民族群体使用有害和负面语言的频率高于其他群体。我们提出了一种方法,通过测量模型衍生的显著特征与人工识别的 RE 相关文本跨度之间的一致性,来审核为识别临床文本中的 RE 信息而训练的模型的学习关联。我们的研究表明,虽然模型表面上表现良好,但如果不加以解决,RE 识别模型存在着与所学关联相关的问题,并有可能在未来造成危害。
{"title":"Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.","authors":"Oliver J Bear Don't Walk Iv, Adrienne Pichon, Harry Reyes Nieva, Tony Sun, Jaan Altosaar, Karthik Natarajan, Adler Perotte, Peter Tarczy-Hornoch, Dina Demner-Fushman, Noémie Elhadad","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"289-298"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Holly R Tomlin, Michel Wissing, Sai Tanikella, Preetinder Kaur, Linda Tabas
Professional medical publications writers (PMWs) cover a wide range of biomedical writing activities that recently includes translation of biomedical publications to plain language summaries (PLS). The consumer health informatics literature (CHI) consistently describes the importance of incorporating health literacy principles in any natural language processing (NLP) app designed to communicate medical information to lay audiences, particularly patients. In this stepwise systematic review, we searched PubMed indexed literature for CHI NLP-based apps that have the potential to assist PMWs in developing text based PLS. Results showed that available apps are limited to patient portals and other technologies used to communicate medical text and reports from electronic health records. PMWs can apply the lessons learned from CHI NLP-based apps to supervise development of tools specific to text simplification and summarization for PLS from biomedical publications.
专业医学出版物撰稿人(PMWs)从事广泛的生物医学写作活动,最近包括将生物医学出版物翻译成通俗语言摘要(PLS)。消费者健康信息学(CHI)文献一直在描述将健康素养原则纳入任何旨在向非专业受众(尤其是患者)传达医疗信息的自然语言处理(NLP)应用程序的重要性。在这一逐步式系统综述中,我们搜索了 PubM 索引文献中基于 CHI NLP 的应用程序,这些应用程序有可能帮助 PMW 开发基于文本的 PLS。结果显示,现有的应用程序仅限于患者门户网站和其他用于交流医疗文本和电子健康记录报告的技术。项目管理人员可以应用从基于CHI NLP的应用程序中汲取的经验教训,监督开发专门用于简化和总结生物医学出版物中的PLS文本的工具。
{"title":"Challenges and Opportunities for Professional Medical Publications Writers to Contribute to Plain Language Summaries (PLS) in an AI/ML Environment - A Consumer Health Informatics Systematic Review.","authors":"Holly R Tomlin, Michel Wissing, Sai Tanikella, Preetinder Kaur, Linda Tabas","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Professional medical publications writers (PMWs) cover a wide range of biomedical writing activities that recently includes translation of biomedical publications to plain language summaries (PLS). The consumer health informatics literature (CHI) consistently describes the importance of incorporating health literacy principles in any natural language processing (NLP) app designed to communicate medical information to lay audiences, particularly patients. In this stepwise systematic review, we searched PubMed indexed literature for CHI NLP-based apps that have the potential to assist PMWs in developing text based PLS. Results showed that available apps are limited to patient portals and other technologies used to communicate medical text and reports from electronic health records. PMWs can apply the lessons learned from CHI NLP-based apps to supervise development of tools specific to text simplification and summarization for PLS from biomedical publications.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"709-717"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evan T Sholle, Marcos A Davila, Kristin Kostka, Sajjad Abedian, Marika Cusick, Spencer Krichevsky, Jyotishman Pathak, Thomas R Campion
Obtaining reliable data on patient mortality is a critical challenge facing observational researchers seeking to conduct studies using real-world data. As these analyses are conducted more broadly using newly-available sources of real-world evidence, missing data can serve as a rate-limiting factor. We conducted a comparison of mortality data sources from different stakeholder perspectives - academic medical center (AMC) informatics service providers, AMC research coordinators, industry analytics professionals, and academics - to understand the strengths and limitations of differing mortality data sources: locally generated data from sites conducting research, data provided by governmental sources, and commercially available data sets. Researchers seeking to conduct observational studies using extant data should consider these factors in sourcing outcomes data for their populations of interest.
{"title":"Comparative Merits of Available Mortality Data Sources for Clinical Research.","authors":"Evan T Sholle, Marcos A Davila, Kristin Kostka, Sajjad Abedian, Marika Cusick, Spencer Krichevsky, Jyotishman Pathak, Thomas R Campion","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Obtaining reliable data on patient mortality is a critical challenge facing observational researchers seeking to conduct studies using real-world data. As these analyses are conducted more broadly using newly-available sources of real-world evidence, missing data can serve as a rate-limiting factor. We conducted a comparison of mortality data sources from different stakeholder perspectives - academic medical center (AMC) informatics service providers, AMC research coordinators, industry analytics professionals, and academics - to understand the strengths and limitations of differing mortality data sources: locally generated data from sites conducting research, data provided by governmental sources, and commercially available data sets. Researchers seeking to conduct observational studies using extant data should consider these factors in sourcing outcomes data for their populations of interest.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"634-640"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139467388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}