Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00030
Ibna Kowsar, Shourav B Rabbani, Manar D Samad
The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.
{"title":"Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.","authors":"Ibna Kowsar, Shourav B Rabbani, Manar D Samad","doi":"10.1109/ichi61247.2024.00030","DOIUrl":"10.1109/ichi61247.2024.00030","url":null,"abstract":"<p><p>The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"177-182"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00058
Qingqing Zhu, Xiuying Chen, Qiao Jin, Benjamin Hou, Tejas Sudharshan Mathai, Pritam Mukherjee, Xin Gao, Ronald M Summers, Zhiyong Lu
In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI-generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a correlation that is 0.48, outperforming the METEOR metric by 0.19, while our "Regressed GPT-4" model shows even greater alignment(0.64) with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports.
{"title":"Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for AI-generated Radiology Reports.","authors":"Qingqing Zhu, Xiuying Chen, Qiao Jin, Benjamin Hou, Tejas Sudharshan Mathai, Pritam Mukherjee, Xin Gao, Ronald M Summers, Zhiyong Lu","doi":"10.1109/ichi61247.2024.00058","DOIUrl":"10.1109/ichi61247.2024.00058","url":null,"abstract":"<p><p>In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI-generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our \"Detailed GPT-4 (5-shot)\" model achieves a correlation that is 0.48, outperforming the METEOR metric by 0.19, while our \"Regressed GPT-4\" model shows even greater alignment(0.64) with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"402-411"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11651630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00046
Aokun Chen, Daniel Paredes, Zehao Yu, Xiwei Lou, Roberta Brunson, Jamie N Thomas, Kimberly A Martinez, Robert J Lucero, Tanja Magoc, Laurence M Solberg, Urszula A Snigurska, Sarah E Ser, Mattia Prosperi, Jiang Bian, Ragnhildur I Bjarnadottir, Yonghui Wu
Delirium is an acute decline or fluctuation in attention, awareness, or other cognitive function that can lead to serious adverse outcomes. Despite the severe outcomes, delirium is frequently unrecognized and uncoded in patients' electronic health records (EHRs) due to its transient and diverse nature. Natural language processing (NLP), a key technology that extracts medical concepts from clinical narratives, has shown great potential in studies of delirium outcomes and symptoms. To assist in the diagnosis and phenotyping of delirium, we formed an expert panel to categorize diverse delirium symptoms, composed annotation guidelines, created a delirium corpus with diverse delirium symptoms, and developed NLP methods to extract delirium symptoms from clinical notes. We compared 5 state-of-the-art transformer models including 2 models (BERT and RoBERTa) from the general domain and 3 models (BERT_MIMIC, RoBERTa_MIMIC, and GatorTron) from the clinical domain. GatorTron achieved the best strict and lenient F1 scores of 0.8055 and 0.8759, respectively. We conducted an error analysis to identify challenges in annotating delirium symptoms and developing NLP systems. To the best of our knowledge, this is the first large language model-based delirium symptom extraction system. Our study lays the foundation for the future development of computable phenotypes and diagnosis methods for delirium.
{"title":"Identifying Symptoms of Delirium from Clinical Narratives Using Natural Language Processing.","authors":"Aokun Chen, Daniel Paredes, Zehao Yu, Xiwei Lou, Roberta Brunson, Jamie N Thomas, Kimberly A Martinez, Robert J Lucero, Tanja Magoc, Laurence M Solberg, Urszula A Snigurska, Sarah E Ser, Mattia Prosperi, Jiang Bian, Ragnhildur I Bjarnadottir, Yonghui Wu","doi":"10.1109/ichi61247.2024.00046","DOIUrl":"10.1109/ichi61247.2024.00046","url":null,"abstract":"<p><p>Delirium is an acute decline or fluctuation in attention, awareness, or other cognitive function that can lead to serious adverse outcomes. Despite the severe outcomes, delirium is frequently unrecognized and uncoded in patients' electronic health records (EHRs) due to its transient and diverse nature. Natural language processing (NLP), a key technology that extracts medical concepts from clinical narratives, has shown great potential in studies of delirium outcomes and symptoms. To assist in the diagnosis and phenotyping of delirium, we formed an expert panel to categorize diverse delirium symptoms, composed annotation guidelines, created a delirium corpus with diverse delirium symptoms, and developed NLP methods to extract delirium symptoms from clinical notes. We compared 5 state-of-the-art transformer models including 2 models (BERT and RoBERTa) from the general domain and 3 models (BERT_MIMIC, RoBERTa_MIMIC, and GatorTron) from the clinical domain. GatorTron achieved the best strict and lenient F1 scores of 0.8055 and 0.8759, respectively. We conducted an error analysis to identify challenges in annotating delirium symptoms and developing NLP systems. To the best of our knowledge, this is the first large language model-based delirium symptom extraction system. Our study lays the foundation for the future development of computable phenotypes and diagnosis methods for delirium.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"305-311"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670120/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142900616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00020
Mattia Prosperi, Simone Marini, Christina Boucher
A problem extension of the longest common substring (LCS) between two texts is the enumeration of all LCSs given a minimum length (ALCS- ), along with their positions in each text. In bioinformatics, an efficient solution to the ALCS- for very long texts -genomes or metagenomes- can provide useful insights to discover genetic signatures responsible for biological mechanisms. The ALCS- problem has two additional requirements compared to the LCS problem: one is the minimum length , and the other is that all common strings longer than must be reported. We present an efficient, two-stage ALCS- algorithm exploiting the spectrum of text substrings of length ( -mers). Our approach yields a worst-case time complexity loglinear in the number of -mers for the first stage, and an average-case loglinear in the number of common -mers for the second stage (several orders of magnitudes smaller than the total -mer spectrum). The space complexity is linear in the first phase (disk-based), and on average linear in the second phase (disk- and memory-based). Tests performed on genomes for different organisms (including viruses, bacteria and animal chromosomes) show that run times are consistent with our theoretical estimates; further, comparisons with MUMmer4 show an asymptotic advantage with divergent genomes.
两个文本之间最长公共子串(LCS)问题的扩展是枚举给定最小长度 k 的所有 LCS(ALCS- k)以及它们在每个文本中的位置。在生物信息学中,针对超长文本--基因组或元基因组--的 ALCS- k 的有效解决方案可以为发现生物机制的遗传特征提供有用的见解。与 LCS 问题相比,ALCS- k 问题有两个额外的要求:一个是最小长度 k,另一个是必须报告所有长于 k 的普通字符串。我们提出了一种高效的两阶段 ALCS- k 算法,该算法利用了长度为 k 的文本子串谱(k -mers)。我们的方法在最坏情况下,第一阶段的时间复杂度与 k -mers 的数量成对数线性关系,在平均情况下,第二阶段的时间复杂度与常见 k -mers 的数量成对数线性关系(比总 k -mers 频谱小几个数量级)。空间复杂度在第一阶段(基于磁盘)是线性的,在第二阶段(基于磁盘和内存)平均是线性的。在不同生物体(包括病毒、细菌和动物染色体)基因组上进行的测试表明,运行时间与我们的理论估计值一致;此外,与 MUMmer4 的比较显示,在不同基因组上具有渐进优势。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">An average-case efficient two-stage algorithm for enumerating all longest common substrings of minimum length <ns0:math><ns0:mi>k</ns0:mi></ns0:math> between genome pairs.","authors":"Mattia Prosperi, Simone Marini, Christina Boucher","doi":"10.1109/ichi61247.2024.00020","DOIUrl":"10.1109/ichi61247.2024.00020","url":null,"abstract":"<p><p>A problem extension of the longest common substring (LCS) between two texts is the enumeration of all LCSs given a minimum length <math><mi>k</mi></math> (ALCS- <math><mi>k</mi></math> ), along with their positions in each text. In bioinformatics, an efficient solution to the ALCS- <math><mi>k</mi></math> for very long texts -genomes or metagenomes- can provide useful insights to discover genetic signatures responsible for biological mechanisms. The ALCS- <math><mi>k</mi></math> problem has two additional requirements compared to the LCS problem: one is the minimum length <math><mi>k</mi></math> , and the other is that all common strings longer than <math><mi>k</mi></math> must be reported. We present an efficient, two-stage ALCS- <math><mi>k</mi></math> algorithm exploiting the spectrum of text substrings of length <math><mi>k</mi></math> ( <math><mi>k</mi></math> -mers). Our approach yields a worst-case time complexity loglinear in the number of <math><mi>k</mi></math> -mers for the first stage, and an average-case loglinear in the number of common <math><mi>k</mi></math> -mers for the second stage (several orders of magnitudes smaller than the total <math><mi>k</mi></math> -mer spectrum). The space complexity is linear in the first phase (disk-based), and on average linear in the second phase (disk- and memory-based). Tests performed on genomes for different organisms (including viruses, bacteria and animal chromosomes) show that run times are consistent with our theoretical estimates; further, comparisons with MUMmer4 show an asymptotic advantage with divergent genomes.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"93-102"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11412151/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00012
Eloisa Nguyen, Rebecca Z Lin, Yang Gong, Cui Tao, Muhammad Tuan Amith
Many studies have examined the impact of exercise and other physical activities in influencing the health outcomes of individuals. These physical activities entail an intricate sequence and series of physical anatomy, physiological movement, movement of the anatomy, etc. To better understand how these components interact with one another and their downstream impact on health outcomes, there needs to be an information model that conceptualizes all entities involved. In this study, we introduced our early development of an ontology model to computationally describe human physical activities and the various entities that compose each activity. We developed an open-sourced biomedical ontology called the Kinetic Human Movement Ontology that reused OBO Foundry terminologies and encoded in OWL2. We applied this ontology in modeling and linking a specific Tai Chi movement. The contribution of this work could enable modeling of information relating to human physical activity, like exercise, and lead towards information standardization of human movement for analysis. Future work will include expanding our ontology to include more expressive information and completely modeling entire sets of movement from human physical activity.
许多研究都探讨了运动和其他体育活动对个人健康结果的影响。这些体能活动包含一系列错综复杂的物理解剖、生理运动、解剖运动等。为了更好地理解这些组成部分之间如何相互作用以及它们对健康结果的下游影响,需要有一个信息模型来概念化所涉及的所有实体。在本研究中,我们介绍了我们早期开发的本体模型,该模型用于计算描述人类的身体活动以及构成每项活动的各种实体。我们开发了一个开源的生物医学本体,名为 "人体运动本体"(Kinetic Human Movement Ontology),该本体重复使用了 OBO Foundry 术语,并用 OWL2 进行了编码。我们将该本体应用于特定太极运动的建模和链接。这项工作的贡献在于能够对与人类身体活动(如运动)相关的信息进行建模,并实现人类运动分析的信息标准化。未来的工作将包括扩展我们的本体,以包含更具表现力的信息,并对人类体育活动的整套动作进行完全建模。
{"title":"Developing a computational representation of human physical activity and exercise using open ontology-based approach: a Tai Chi use case.","authors":"Eloisa Nguyen, Rebecca Z Lin, Yang Gong, Cui Tao, Muhammad Tuan Amith","doi":"10.1109/ichi61247.2024.00012","DOIUrl":"10.1109/ichi61247.2024.00012","url":null,"abstract":"<p><p>Many studies have examined the impact of exercise and other physical activities in influencing the health outcomes of individuals. These physical activities entail an intricate sequence and series of physical anatomy, physiological movement, movement of the anatomy, etc. To better understand how these components interact with one another and their downstream impact on health outcomes, there needs to be an information model that conceptualizes all entities involved. In this study, we introduced our early development of an ontology model to computationally describe human physical activities and the various entities that compose each activity. We developed an open-sourced biomedical ontology called the Kinetic Human Movement Ontology that reused OBO Foundry terminologies and encoded in OWL2. We applied this ontology in modeling and linking a specific Tai Chi movement. The contribution of this work could enable modeling of information relating to human physical activity, like exercise, and lead towards information standardization of human movement for analysis. Future work will include expanding our ontology to include more expressive information and completely modeling entire sets of movement from human physical activity.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"31-39"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503552/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00032
Richard Li Xu, Song Wang, Zewei Wang, Yuhan Zhang, Yunyu Xiao, Jyotishman Pathak, David Hodge, Yan Leng, S Craig Watkins, Ying Ding, Yifan Peng
Social factors like family background, education level, financial status, and stress can impact public health outcomes, such as suicidal ideation. However, the analysis of social factors for suicide prevention has been limited by the lack of up-to-date suicide reporting data, variations in reporting practices, and small sample sizes. In this study, we analyzed 172,629 suicide incidents from 2014 to 2020 utilizing the National Violent Death Reporting System Restricted Access Database (NVDRS-RAD). Logistic regression models were developed to examine the relationships between demographics and suicide-related circumstances. Trends over time were assessed, and Latent Dirichlet Allocation (LDA) was used to identify common suicide-related social factors. Mental health, interpersonal relationships, mental health treatment and disclosure, and school/work-related stressors were identified as the main themes of suicide-related social factors. This study also identified systemic disparities across various population groups, particularly concerning Black individuals, young people aged under 24, healthcare practitioners, and those with limited education backgrounds, which shed light on potential directions for demographic-specific suicidal interventions.
{"title":"Analyzing Social Factors to Enhance Suicide Prevention Across Population Groups.","authors":"Richard Li Xu, Song Wang, Zewei Wang, Yuhan Zhang, Yunyu Xiao, Jyotishman Pathak, David Hodge, Yan Leng, S Craig Watkins, Ying Ding, Yifan Peng","doi":"10.1109/ichi61247.2024.00032","DOIUrl":"10.1109/ichi61247.2024.00032","url":null,"abstract":"<p><p>Social factors like family background, education level, financial status, and stress can impact public health outcomes, such as suicidal ideation. However, the analysis of social factors for suicide prevention has been limited by the lack of up-to-date suicide reporting data, variations in reporting practices, and small sample sizes. In this study, we analyzed 172,629 suicide incidents from 2014 to 2020 utilizing the National Violent Death Reporting System Restricted Access Database (NVDRS-RAD). Logistic regression models were developed to examine the relationships between demographics and suicide-related circumstances. Trends over time were assessed, and Latent Dirichlet Allocation (LDA) was used to identify common suicide-related social factors. Mental health, interpersonal relationships, mental health treatment and disclosure, and school/work-related stressors were identified as the main themes of suicide-related social factors. This study also identified systemic disparities across various population groups, particularly concerning Black individuals, young people aged under 24, healthcare practitioners, and those with limited education backgrounds, which shed light on potential directions for demographic-specific suicidal interventions.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"189-199"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11450796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00025
Yuxi Liu, Zhenhao Zhang, Shaowen Qin, Jiang Bian
Multivariate clinical time series data, such as those contained in Electronic Health Records (EHR), often exhibit high levels of irregularity, notably, many missing values and varying time intervals. Existing methods usually construct deep neural network architectures that combine recurrent neural networks and time decay mechanisms to model variable correlations, impute missing values, and capture the impact of varying time intervals. The complete data matrices thus obtained from the imputation task are used for downstream risk prediction tasks. This study aims to achieve more desirable imputation and prediction accuracy by performing both tasks simultaneously. We present a new multi-task deep neural network that incorporates the imputation task as an auxiliary task while performing risk prediction tasks. We validate the method on clinical time series imputation and in-hospital mortality prediction tasks using two publicly available EHR databases. The experimental results show that our method outperforms state-of-the-art imputation-prediction methods by significant margins. The results also empirically demonstrate that the incorporation of time decay mechanisms is a critical factor for superior imputation and prediction performance. The novel deep imputation-prediction network proposed in this study provides more accurate imputation and prediction results with EHR data. Future work should focus on developing more effective time decay mechanisms for simultaneously enhancing the imputation and prediction performance of multi-task learning models.
{"title":"Multi-Task Deep Neural Networks for Irregularly Sampled Multivariate Clinical Time Series.","authors":"Yuxi Liu, Zhenhao Zhang, Shaowen Qin, Jiang Bian","doi":"10.1109/ichi61247.2024.00025","DOIUrl":"10.1109/ichi61247.2024.00025","url":null,"abstract":"<p><p>Multivariate clinical time series data, such as those contained in Electronic Health Records (EHR), often exhibit high levels of irregularity, notably, many missing values and varying time intervals. Existing methods usually construct deep neural network architectures that combine recurrent neural networks and time decay mechanisms to model variable correlations, impute missing values, and capture the impact of varying time intervals. The complete data matrices thus obtained from the imputation task are used for downstream risk prediction tasks. This study aims to achieve more desirable imputation and prediction accuracy by performing both tasks simultaneously. We present a new multi-task deep neural network that incorporates the imputation task as an auxiliary task while performing risk prediction tasks. We validate the method on clinical time series imputation and in-hospital mortality prediction tasks using two publicly available EHR databases. The experimental results show that our method outperforms state-of-the-art imputation-prediction methods by significant margins. The results also empirically demonstrate that the incorporation of time decay mechanisms is a critical factor for superior imputation and prediction performance. The novel deep imputation-prediction network proposed in this study provides more accurate imputation and prediction results with EHR data. Future work should focus on developing more effective time decay mechanisms for simultaneously enhancing the imputation and prediction performance of multi-task learning models.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"135-140"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142900697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00084
Liyue Fan, Ashley Bang, Luca Bonomi
Data synthesis can address important data availability challenges in biomedical informatics. Quantitative evaluation of generative models may help understand their applications to synthesizing biomedical data. This poster paper examines state-of-the-art generative models used in medical imaging, such as StyleGAN and DDPM models, and evaluates their performance in learning data manifolds and in the visible features of generated samples. Results show that existing generative models have much to improve based on the studied measures.
{"title":"Evaluating Generative Models in Medical Imaging.","authors":"Liyue Fan, Ashley Bang, Luca Bonomi","doi":"10.1109/ichi61247.2024.00084","DOIUrl":"10.1109/ichi61247.2024.00084","url":null,"abstract":"<p><p>Data synthesis can address important data availability challenges in biomedical informatics. Quantitative evaluation of generative models may help understand their applications to synthesizing biomedical data. This poster paper examines state-of-the-art generative models used in medical imaging, such as StyleGAN and DDPM models, and evaluates their performance in learning data manifolds and in the visible features of generated samples. Results show that existing generative models have much to improve based on the studied measures.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"553-555"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11508590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-22DOI: 10.1109/ichi61247.2024.00009
Yuxi Liu, Zhenhao Zhang, Shaowen Qin, Flora D Salim, Jiang Bian, Antonio Jimeno Yepes
Predictive analytics using Electronic Health Records (EHRs) have become an active research area in recent years, especially with the development of deep learning techniques. A popular EHR data analysis paradigm in deep learning is patient representation learning, which aims to learn a condensed mathematical representation of individual patients. However, EHR data are often inherently irregular, i.e., data entries were captured at different times as well as with different contents due to the individualized needs of each patient. Most of the work focused on the provision of deep neural networks with attention mechanisms that generate complete patient representations that can be readily used for downstream prediction tasks. However, such approaches fail to take patient similarity into account, which is generally used in clinical reasoning scenarios. This study presents a new Contrastive Graph Similarity Network for similarity calculation among patients in large EHR datasets. Particularly, we apply graph-based similarity analysis that explicitly extracts the clinical characteristics of each patient and aggregates the information of similar patients to generate rich patient representations. Experimental results on real-world EHR databases demonstrate the effectiveness and superiority of our method for the task of vital signs imputation and ICU patient deterioration prediction.
{"title":"Fine-grained Patient Similarity Measuring using Contrastive Graph Similarity Networks.","authors":"Yuxi Liu, Zhenhao Zhang, Shaowen Qin, Flora D Salim, Jiang Bian, Antonio Jimeno Yepes","doi":"10.1109/ichi61247.2024.00009","DOIUrl":"10.1109/ichi61247.2024.00009","url":null,"abstract":"<p><p>Predictive analytics using Electronic Health Records (EHRs) have become an active research area in recent years, especially with the development of deep learning techniques. A popular EHR data analysis paradigm in deep learning is patient representation learning, which aims to learn a condensed mathematical representation of individual patients. However, EHR data are often inherently irregular, i.e., data entries were captured at different times as well as with different contents due to the individualized needs of each patient. Most of the work focused on the provision of deep neural networks with attention mechanisms that generate complete patient representations that can be readily used for downstream prediction tasks. However, such approaches fail to take patient similarity into account, which is generally used in clinical reasoning scenarios. This study presents a new Contrastive Graph Similarity Network for similarity calculation among patients in large EHR datasets. Particularly, we apply graph-based similarity analysis that explicitly extracts the clinical characteristics of each patient and aggregates the information of similar patients to generate rich patient representations. Experimental results on real-world EHR databases demonstrate the effectiveness and superiority of our method for the task of vital signs imputation and ICU patient deterioration prediction.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142857143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00022
Liyue Fan, Luca Bonomi
Deep neural networks have been increasingly integrated in healthcare applications to enable accurate predicative analyses. Sharing trained deep models not only facilitates knowledge integration in collaborative research efforts but also enables equitable access to computational intelligence. However, recent studies have shown that an adversary may leverage a shared model to learn the participation of a target individual in the training set. In this work, we investigate privacy-protecting model sharing for survival studies. Specifically, we pose three research questions. (1) Do deep survival models leak membership information? (2) How effective is differential privacy in defending against membership inference in deep survival analyses? (3) Are there other effects of differential privacy on deep survival analyses? Our study assesses the membership leakage in emerging deep survival models and develops differentially private training procedures to provide rigorous privacy protection. The experimental results show that deep survival models leak membership information and our approach effectively reduces membership inference risks. The results also show that differential privacy introduces a limited performance loss, and may improve the model robustness in the presence of noisy data, compared to non-private models.
{"title":"Mitigating Membership Inference in Deep Survival Analyses with Differential Privacy.","authors":"Liyue Fan, Luca Bonomi","doi":"10.1109/ichi57859.2023.00022","DOIUrl":"10.1109/ichi57859.2023.00022","url":null,"abstract":"<p><p>Deep neural networks have been increasingly integrated in healthcare applications to enable accurate predicative analyses. Sharing trained deep models not only facilitates knowledge integration in collaborative research efforts but also enables equitable access to computational intelligence. However, recent studies have shown that an adversary may leverage a shared model to learn the participation of a target individual in the training set. In this work, we investigate privacy-protecting model sharing for survival studies. Specifically, we pose three research questions. (1) Do deep survival models leak membership information? (2) How effective is differential privacy in defending against membership inference in deep survival analyses? (3) Are there other effects of differential privacy on deep survival analyses? Our study assesses the membership leakage in emerging deep survival models and develops differentially private training procedures to provide rigorous privacy protection. The experimental results show that deep survival models leak membership information and our approach effectively reduces membership inference risks. The results also show that differential privacy introduces a limited performance loss, and may improve the model robustness in the presence of noisy data, compared to non-private models.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10751041/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139049861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}