Liver transplantation often faces fairness challenges across subgroups defined by sensitive attributes such as age group, gender, and race/ethnicity. Machine learning models for outcome prediction can introduce additional biases. Therefore, we introduce Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm for fair predictions of graft failure risk in liver transplant patients. FERI constrains subgroup loss by balancing learning rates and preventing subgroup dominance in the training process. Our results show that FERI maintained high predictive accuracy with AUROC and AUPRC comparable to baseline models. More importantly, FERI demonstrated an ability to improve fairness without sacrificing accuracy. Specifically, for the gender, FERI reduced the demographic parity disparity by 71.74%, and for the age group, it decreased the equalized odds disparity by 40.46%. Therefore, the FERI algorithm advanced fairness-aware predictive modeling in healthcare and provides an invaluable tool for equitable healthcare systems.
{"title":"FERI: A Multitask-based Fairness Achieving Algorithm with Applications to Fair Organ Transplantation.","authors":"Can Li, Dejian Lai, Xiaoqian Jiang, Kai Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Liver transplantation often faces fairness challenges across subgroups defined by sensitive attributes such as age group, gender, and race/ethnicity. Machine learning models for outcome prediction can introduce additional biases. Therefore, we introduce <b>F</b>airness through the <b>E</b>quitable <b>R</b>ate of <b>I</b>mprovement in Multitask Learning (FERI) algorithm for fair predictions of graft failure risk in liver transplant patients. FERI constrains subgroup loss by balancing learning rates and preventing subgroup dominance in the training process. Our results show that FERI maintained high predictive accuracy with AUROC and AUPRC comparable to baseline models. More importantly, FERI demonstrated an ability to improve fairness without sacrificing accuracy. Specifically, for the gender, FERI reduced the demographic parity disparity by 71.74%, and for the age group, it decreased the equalized odds disparity by 40.46%. Therefore, the FERI algorithm advanced fairness-aware predictive modeling in healthcare and provides an invaluable tool for equitable healthcare systems.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141863/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cancer outcomes are poor in resource-limited countries owing to high costs and insufficient pathologist-population ratio. The advent of digital pathology has assisted in improving cancer outcomes, however, Whole Slide Image scanners are expensive and not affordable in low-income countries. Microscope-acquired images on the other hand are cheap to collect and can be more viable for automation of cancer detection. In this study, we propose LCH-Network, a novel method to identify the cancer mitotic count from microscope-acquired images. We introduced Label Mix, and also synthesized images using GANs to handle data imbalance. Moreover, we applied progressive resolution to handle different image scales for mitotic localization. We achieved F1-Score of 0.71 and outperformed other existing techniques. Our findings enable mitotic count estimation from microscopic images with a low-cost setup. Clinically, our method could help avoid presumptive treatment without a confirmed cancer diagnosis.
在资源有限的国家,由于成本高昂和病理学家与人口比例不足,癌症治疗效果不佳。数字病理学的出现有助于改善癌症治疗效果,但是全切片图像扫描仪价格昂贵,低收入国家负担不起。另一方面,显微镜获取的图像收集成本低廉,可用于癌症的自动化检测。在这项研究中,我们提出了一种从显微镜获取的图像中识别癌症有丝分裂计数的新方法--LCH-Network。我们引入了标签混合(Label Mix)技术,并使用 GANs 合成图像以处理数据不平衡问题。此外,我们还采用了渐进式分辨率来处理不同比例的有丝分裂定位图像。我们的 F1 分数达到了 0.71,优于其他现有技术。我们的研究结果使有丝分裂计数的估算能够以低成本的设置从显微图像中进行。在临床上,我们的方法有助于避免在未确诊癌症的情况下进行推测性治疗。
{"title":"Low-Cost Histopathological Mitosis Detection for Microscope-acquired Images.","authors":"Bilal Shabbir, Saira Saleem, Iffat Aleem, Nida Babar, Hammad Farooq, Asif Loya, Hammad Naveed","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cancer outcomes are poor in resource-limited countries owing to high costs and insufficient pathologist-population ratio. The advent of digital pathology has assisted in improving cancer outcomes, however, Whole Slide Image scanners are expensive and not affordable in low-income countries. Microscope-acquired images on the other hand are cheap to collect and can be more viable for automation of cancer detection. In this study, we propose LCH-Network, a novel method to identify the cancer mitotic count from microscope-acquired images. We introduced Label Mix, and also synthesized images using GANs to handle data imbalance. Moreover, we applied progressive resolution to handle different image scales for mitotic localization. We achieved F1-Score of 0.71 and outperformed other existing techniques. Our findings enable mitotic count estimation from microscopic images with a low-cost setup. Clinically, our method could help avoid presumptive treatment without a confirmed cancer diagnosis.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-27DOI: 10.48550/arXiv.2306.15651
Tanjida Kabir, Luyao Chen, M. Walji, L. Giancardo, Xiaoqian Jiang, Shayan Shams
Learning about diagnostic features and related clinical information from dental radiographs is important for dental research. However, the lack of expert-annotated data and convenient search tools poses challenges. Our primary objective is to design a search tool that uses a user's query for oral-related research. The proposed framework, Contrastive LAnguage Image REtrieval Search for dental research, Dental CLAIRES, utilizes periapical radiographs and associated clinical details such as periodontal diagnosis, demographic information to retrieve the best-matched images based on the text query. We applied a contrastive representation learning method to find images described by the user's text by maximizing the similarity score of positive pairs (true pairs) and minimizing the score of negative pairs (random pairs). Our model achieved a hit@3 ratio of 96% and a Mean Reciprocal Rank (MRR) of 0.82. We also designed a graphical user interface that allows researchers to verify the model's performance with interactions.
{"title":"Dental CLAIRES: Contrastive LAnguage Image REtrieval Search for Dental Research","authors":"Tanjida Kabir, Luyao Chen, M. Walji, L. Giancardo, Xiaoqian Jiang, Shayan Shams","doi":"10.48550/arXiv.2306.15651","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15651","url":null,"abstract":"Learning about diagnostic features and related clinical information from dental radiographs is important for dental research. However, the lack of expert-annotated data and convenient search tools poses challenges. Our primary objective is to design a search tool that uses a user's query for oral-related research. The proposed framework, Contrastive LAnguage Image REtrieval Search for dental research, Dental CLAIRES, utilizes periapical radiographs and associated clinical details such as periodontal diagnosis, demographic information to retrieve the best-matched images based on the text query. We applied a contrastive representation learning method to find images described by the user's text by maximizing the similarity score of positive pairs (true pairs) and minimizing the score of negative pairs (random pairs). Our model achieved a hit@3 ratio of 96% and a Mean Reciprocal Rank (MRR) of 0.82. We also designed a graphical user interface that allows researchers to verify the model's performance with interactions.","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80494339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boning Tong, Shannon L Risacher, Jingxuan Bao, Yanbo Feng, Xinkai Wang, Marylyn D Ritchie, Jason H Moore, Ryan Urbanowicz, Andrew J Saykin, Li Shen
Amyloid imaging has been widely used in Alzheimer's disease (AD) diagnosis and biomarker discovery through detecting the regional amyloid plaque density. It is essential to be normalized by a reference region to reduce noise and artifacts. To explore an optimal normalization strategy, we employ an automated machine learning (AutoML) pipeline, STREAMLINE, to conduct the AD diagnosis binary classification and perform permutation-based feature importance analysis with thirteen machine learning models. In this work, we perform a comparative study to evaluate the prediction performance and biomarker discovery capability of three amyloid imaging measures, including one original measure and two normalized measures using two reference regions (i.e., the whole cerebellum and the composite reference region). Our AutoML results indicate that the composite reference region normalization dataset yields a higher balanced accuracy, and identifies more AD-related regions based on the fractioned feature importance ranking.
淀粉样蛋白成像通过检测区域淀粉样蛋白斑块密度,已广泛应用于阿尔茨海默病(AD)的诊断和生物标志物的发现。淀粉样蛋白成像必须通过参考区域进行归一化处理,以减少噪声和伪影。为了探索最佳归一化策略,我们采用了自动机器学习(AutoML)管道 STREAMLINE 来进行 AD 诊断二元分类,并使用 13 种机器学习模型进行基于包型的特征重要性分析。在这项工作中,我们进行了一项比较研究,以评估三种淀粉样蛋白成像测量方法的预测性能和生物标记物发现能力,包括一种原始测量方法和两种使用两个参考区域(即整个小脑和复合参考区域)的归一化测量方法。我们的 AutoML 结果表明,复合参考区域归一化数据集能产生更高的平衡准确度,并能根据特征重要性分级识别出更多的 AD 相关区域。
{"title":"Comparing Amyloid Imaging Normalization Strategies for Alzheimer's Disease Classification using an Automated Machine Learning Pipeline.","authors":"Boning Tong, Shannon L Risacher, Jingxuan Bao, Yanbo Feng, Xinkai Wang, Marylyn D Ritchie, Jason H Moore, Ryan Urbanowicz, Andrew J Saykin, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Amyloid imaging has been widely used in Alzheimer's disease (AD) diagnosis and biomarker discovery through detecting the regional amyloid plaque density. It is essential to be normalized by a reference region to reduce noise and artifacts. To explore an optimal normalization strategy, we employ an automated machine learning (AutoML) pipeline, STREAMLINE, to conduct the AD diagnosis binary classification and perform permutation-based feature importance analysis with thirteen machine learning models. In this work, we perform a comparative study to evaluate the prediction performance and biomarker discovery capability of three amyloid imaging measures, including one original measure and two normalized measures using two reference regions (i.e., the whole cerebellum and the composite reference region). Our AutoML results indicate that the composite reference region normalization dataset yields a higher balanced accuracy, and identifies more AD-related regions based on the fractioned feature importance ranking.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283108/pdf/2306.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, hospitals and healthcare providers have made efforts to reduce surgical site infections as they are a major cause of surgical complications, a prominent reason for hospital readmission, and associated with significantly increased healthcare costs. Traditional surveillance methods for SSI rely on manual chart review, which can be laborious and costly. To assist the chart review process, we developed a long short-term memory (LSTM) model using structured electronic health record data to identify SSI. The top LSTM model resulted in an average precision (AP) of 0.570 [95% CI 0.567, 0.573] and area under the receiver operating characteristic curve (AUROC) of 0.905 [95% CI 0.904, 0.906] compared to the top traditional machine learning model, a random forest, which achieved 0.552 [95% CI 0.549, 0.555] AP and 0.899 [95% CI 0.898, 0.900] AUROC. Our LSTM model represents a step toward automated surveillance of SSIs, a critical component of quality improvement mechanisms.
最近,医院和医疗服务提供者都在努力减少手术部位感染,因为手术部位感染是导致手术并发症的主要原因,也是导致再次入院的主要原因,同时还会导致医疗成本大幅增加。传统的 SSI 监测方法依赖于人工病历审查,既费力又费钱。为了协助病历审查过程,我们利用结构化电子病历数据开发了一个长短期记忆(LSTM)模型,用于识别 SSI。顶级 LSTM 模型的平均精确度 (AP) 为 0.570 [95% CI 0.567, 0.573],接收者工作特征曲线下面积 (AUROC) 为 0.905 [95% CI 0.904, 0.906],而顶级传统机器学习模型(随机森林)的平均精确度 (AP) 为 0.552 [95% CI 0.549, 0.555],接收者工作特征曲线下面积 (AUROC) 为 0.899 [95% CI 0.898, 0.900]。我们的 LSTM 模型向 SSI 自动监控迈出了一步,而 SSI 是质量改进机制的关键组成部分。
{"title":"Developing an LSTM Model to Identify Surgical Site Infections using Electronic Healthcare Records.","authors":"Amber C Kiser, Karen Eilbeck, Brian T Bucher","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recently, hospitals and healthcare providers have made efforts to reduce surgical site infections as they are a major cause of surgical complications, a prominent reason for hospital readmission, and associated with significantly increased healthcare costs. Traditional surveillance methods for SSI rely on manual chart review, which can be laborious and costly. To assist the chart review process, we developed a long short-term memory (LSTM) model using structured electronic health record data to identify SSI. The top LSTM model resulted in an average precision (AP) of 0.570 [95% CI 0.567, 0.573] and area under the receiver operating characteristic curve (AUROC) of 0.905 [95% CI 0.904, 0.906] compared to the top traditional machine learning model, a random forest, which achieved 0.552 [95% CI 0.549, 0.555] AP and 0.899 [95% CI 0.898, 0.900] AUROC. Our LSTM model represents a step toward automated surveillance of SSIs, a critical component of quality improvement mechanisms.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283140/pdf/2161.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sitong Zhou, Kevin Lybarger, Meliha Yetisgen, Mari Ostendorf
Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.
{"title":"Generalizing through Forgetting - Domain Generalization for Symptom Event Extraction in Clinical Notes.","authors":"Sitong Zhou, Kevin Lybarger, Meliha Yetisgen, Mari Ostendorf","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283109/pdf/2329.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9715630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Per-/poly-fluoroalkyl substances (PFAS) are a group of manmade compounds with known human toxicity and evidence of contamination in drinking water throughout the US. We augmented our electronic health record data with geospatial information to classify PFAS exposure for our patients living in New Jersey. We explored the utility of three different methods for classifying PFAS exposure that are popularly used in the literature, resulting in different boundary types: public water supplier service area boundary, municipality, and ZIP code. We also explored the intersection of the three boundaries. To study the potential for bias, we investigated known PFAS exposure-disease associations, specifically hypertension, thyroid disease and parathyroid disease. We found that both the significance of the associations and the effect size varied by the method for classifying PFAS exposure. This has important implications in knowledge discovery and also environmental justice as across cohorts, we found a larger proportion of Black/African-American patients PFAS-exposed.
{"title":"Investigating Three Classification Methods for Per/Poly-Fluoroalkyl Substance (PFAS) Exposure from Electronic Health Records And Potential for Bias.","authors":"Lena M Davidson, Mary Regina Boland","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Per-/poly-fluoroalkyl substances (PFAS) are a group of manmade compounds with known human toxicity and evidence of contamination in drinking water throughout the US. We augmented our electronic health record data with geospatial information to classify PFAS exposure for our patients living in New Jersey. We explored the utility of three different methods for classifying PFAS exposure that are popularly used in the literature, resulting in different boundary types: public water supplier service area boundary, municipality, and ZIP code. We also explored the intersection of the three boundaries. To study the potential for bias, we investigated known PFAS exposure-disease associations, specifically hypertension, thyroid disease and parathyroid disease. We found that both the significance of the associations and the effect size varied by the method for classifying PFAS exposure. This has important implications in knowledge discovery and also environmental justice as across cohorts, we found a larger proportion of Black/African-American patients PFAS-exposed.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283112/pdf/2417.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9712654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanjida Kabir, Luyao Chen, Muhammad F Walji, Luca Giancardo, Xiaoqian Jiang, Shayan Shams
Learning about diagnostic features and related clinical information from dental radiographs is important for dental research. However, the lack of expert-annotated data and convenient search tools poses challenges. Our primary objective is to design a search tool that uses a user's query for oral-related research. The proposed framework, Contrastive LAnguage Image REtrieval Search for dental research, Dental CLAIRES, utilizes periapical radiographs and associated clinical details such as periodontal diagnosis, demographic information to retrieve the best-matched images based on the text query. We applied a contrastive representation learning method to find images described by the user's text by maximizing the similarity score of positive pairs (true pairs) and minimizing the score of negative pairs (random pairs). Our model achieved a hit@3 ratio of 96% and a Mean Reciprocal Rank (MRR) of 0.82. We also designed a graphical user interface that allows researchers to verify the model's performance with interactions.
从牙科 X 射线照片中了解诊断特征和相关临床信息对牙科研究非常重要。然而,缺乏专家注释的数据和便捷的搜索工具带来了挑战。我们的主要目标是设计一种搜索工具,利用用户的查询进行口腔相关研究。我们提出的框架 "用于牙科研究的对比性图像检索搜索(Dental CLAIRES)"利用根尖周X光片和相关的临床细节,如牙周诊断、人口信息等,根据文本查询检索最佳匹配图像。我们采用了一种对比表示学习方法,通过最大化正向配对(真实配对)的相似度得分和最小化负向配对(随机配对)的相似度得分来查找用户文本描述的图像。我们的模型达到了 96% 的命中率(hit@3 ratio)和 0.82 的平均互易等级(MRR)。我们还设计了一个图形用户界面,允许研究人员通过交互来验证模型的性能。
{"title":"Dental CLAIRES: Contrastive LAnguage Image REtrieval Search for Dental Research.","authors":"Tanjida Kabir, Luyao Chen, Muhammad F Walji, Luca Giancardo, Xiaoqian Jiang, Shayan Shams","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Learning about diagnostic features and related clinical information from dental radiographs is important for dental research. However, the lack of expert-annotated data and convenient search tools poses challenges. Our primary objective is to design a search tool that uses a user's query for oral-related research. The proposed framework, <b>C</b>ontrastive <b>LA</b>nguage <b>I</b>mage <b>RE</b>trieval <b>S</b>earch for dental research, Dental CLAIRES, utilizes periapical radiographs and associated clinical details such as periodontal diagnosis, demographic information to retrieve the best-matched images based on the text query. We applied a contrastive representation learning method to find images described by the user's text by maximizing the similarity score of positive pairs (true pairs) and minimizing the score of negative pairs (random pairs). Our model achieved a hit@3 ratio of 96% and a Mean Reciprocal Rank (MRR) of 0.82. We also designed a graphical user interface that allows researchers to verify the model's performance with interactions.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283104/pdf/2343.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10070913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serena Jinchen Xie, Flavia P Kapos, Stephen J Mooney, Sean Mooney, Kari A Stephens, Cynthia Chen, Andrea L Hartzler, Abhishek Pratap
Real-world data (RWD) like electronic health records (EHR) has great potential for secondary use by health systems and researchers. However, collected primarily for efficient health care, EHR data may not equitably represent local regions and populations, impacting the generalizability of insights learned from it. We assessed the geospatial representativeness of regions in a large health system EHR data using a spatial analysis workflow, which provides a data-driven way to quantify geospatial representation and identify adequately represented regions. We applied the workflow to investigate geospatial patterns of overweight/obesity and depression patients to find regional "hotspots" for potential targeted interventions. Our findings show the presence of geospatial bias in EHR and demonstrate the workflow to identify spatial clusters after adjusting for bias due to the geospatial representativeness. This work highlights the importance of evaluating geospatial representativeness in RWD to guide targeted deployment of limited healthcare resources and generate equitable real-world evidence.
{"title":"Geospatial divide in real-world EHR data: Analytical workflow to assess regional biases and potential impact on health equity.","authors":"Serena Jinchen Xie, Flavia P Kapos, Stephen J Mooney, Sean Mooney, Kari A Stephens, Cynthia Chen, Andrea L Hartzler, Abhishek Pratap","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Real-world data (RWD) like electronic health records (EHR) has great potential for secondary use by health systems and researchers. However, collected primarily for efficient health care, EHR data may not equitably represent local regions and populations, impacting the generalizability of insights learned from it. We assessed the geospatial representativeness of regions in a large health system EHR data using a spatial analysis workflow, which provides a data-driven way to quantify geospatial representation and identify adequately represented regions. We applied the workflow to investigate geospatial patterns of overweight/obesity and depression patients to find regional \"hotspots\" for potential targeted interventions. Our findings show the presence of geospatial bias in EHR and demonstrate the workflow to identify spatial clusters after adjusting for bias due to the geospatial representativeness. This work highlights the importance of evaluating geospatial representativeness in RWD to guide targeted deployment of limited healthcare resources and generate equitable real-world evidence.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283143/pdf/2310.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9703645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hormonal therapy is an important adjuvant treatment for breast cancer patients, but medication discontinuation of such therapy is not uncommon. The goal of this paper is to conduct research on the modeling of clinic communications, which have shown value in understanding medication discontinuation, to predict the discontinuation of hormonal therapy medications. Notably, we leveraged the Hypergraph Neural Network to capture the hidden connections of patients that were inferred from clinical communications. Combining the content of clinical communications as well as the demographics, insurance, and cancer stage information, our model achieved an AUC of 67.9%, which significantly outperformed other baselines such as Graph Convolutional Network (65.3%), Random Forest (62.7%), and Support Vector Machine (62.8%). Our study suggested that incorporating the hidden patient connections encoded in clinical communications into prediction models could boost their performance. Future research would consider combining structured medical records and clinical communications to better predict medication discontinuation.
{"title":"The Hidden Patient Connections: Predicting Hormonal Therapy Medication Discontinuation Using Hypergraph Neural Network on Clinical Communications.","authors":"Qingyuan Song, Yunfei Hu, Congning Ni, Zhijun Yin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Hormonal therapy is an important adjuvant treatment for breast cancer patients, but medication discontinuation of such therapy is not uncommon. The goal of this paper is to conduct research on the modeling of clinic communications, which have shown value in understanding medication discontinuation, to predict the discontinuation of hormonal therapy medications. Notably, we leveraged the Hypergraph Neural Network to capture the hidden connections of patients that were inferred from clinical communications. Combining the content of clinical communications as well as the demographics, insurance, and cancer stage information, our model achieved an AUC of 67.9%, which significantly outperformed other baselines such as Graph Convolutional Network (65.3%), Random Forest (62.7%), and Support Vector Machine (62.8%). Our study suggested that incorporating the hidden patient connections encoded in clinical communications into prediction models could boost their performance. Future research would consider combining structured medical records and clinical communications to better predict medication discontinuation.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283142/pdf/2435.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}