Pub Date : 2024-12-01Epub Date: 2025-01-10DOI: 10.1109/bibm62325.2024.10822379
Sungrim Moon, Jessica Maine, Ewy Mathe, Qian Zhu
Understanding the underlying etiologies of rare diseases may facilitate research across multiple conditions, enabling basket trail design and drug repurposing. In this study, we aligned clusters of rare diseases with Orphanet classifications to represent their shared etiologies and establish a foundation for further investigation on underly biological mechanism discovery. By utilizing the linearized Orphanet categories, we connected 35 clusters of rare diseases into 18 classifications. Significant associations were found between the categories "Rare Developmental Defects During Embryogenesis" and "Rare Inborn Errors of Metabolism" and the clusters in this study, suggesting that many rare diseases originating in the prenatal period or related to metabolism may present a substantial opportunity for success in future investigation.
{"title":"Aligning Orphanet Classification to Identify Disease Characteristics among Rare Disease Clusters.","authors":"Sungrim Moon, Jessica Maine, Ewy Mathe, Qian Zhu","doi":"10.1109/bibm62325.2024.10822379","DOIUrl":"10.1109/bibm62325.2024.10822379","url":null,"abstract":"<p><p>Understanding the underlying etiologies of rare diseases may facilitate research across multiple conditions, enabling basket trail design and drug repurposing. In this study, we aligned clusters of rare diseases with Orphanet classifications to represent their shared etiologies and establish a foundation for further investigation on underly biological mechanism discovery. By utilizing the linearized Orphanet categories, we connected 35 clusters of rare diseases into 18 classifications. Significant associations were found between the categories \"Rare Developmental Defects During Embryogenesis\" and \"Rare Inborn Errors of Metabolism\" and the clusters in this study, suggesting that many rare diseases originating in the prenatal period or related to metabolism may present a substantial opportunity for success in future investigation.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"4561-4563"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12422725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145042525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensuring access to cancer treatment facilities is essential for delivering timely care, yet various barriers such as geographic distance, socioeconomic factors, and social disparities can impede access in rural and urban regions. This study measured locational health access for colorectal cancer in the context of hospitals and population distribution in Louisiana. It used data of census tracts, hospital beds and providers, from the National Cancer Institute. By mapping the distribution of these healthcare facilities, the study revealed the potential of identifying significant challenges in accessing specialized cancer care. There is no existing locational health access metric in this domain. The contribution of this paper is that it meticulously calculated the actual road distance of each census tract centroid and each cancer-treating hospital, and offers a new locational health access metric. This metric considers the number of beds and number of oncologists, as a proxy for measurement of cancer treatment facilities. The significance of this work is that it can be applied in a larger scope (such as the country), with more variables, and for other diseases treated by hospitals. It has public policy implications; hospitals can be located through such data-driven analysis.
{"title":"A New Metric for Measuring Locational Health Access for Cancer Treatment.","authors":"Subhajit Chakrabarty, Udaysinh Rathod, Sweta Singh, Debarshi Roy, Ismael Maya","doi":"10.1109/BIBM62325.2024.10822220","DOIUrl":"10.1109/BIBM62325.2024.10822220","url":null,"abstract":"<p><p>Ensuring access to cancer treatment facilities is essential for delivering timely care, yet various barriers such as geographic distance, socioeconomic factors, and social disparities can impede access in rural and urban regions. This study measured locational health access for colorectal cancer in the context of hospitals and population distribution in Louisiana. It used data of census tracts, hospital beds and providers, from the National Cancer Institute. By mapping the distribution of these healthcare facilities, the study revealed the potential of identifying significant challenges in accessing specialized cancer care. There is no existing locational health access metric in this domain. The contribution of this paper is that it meticulously calculated the actual road distance of each census tract centroid and each cancer-treating hospital, and offers a new locational health access metric. This metric considers the number of beds and number of oncologists, as a proxy for measurement of cancer treatment facilities. The significance of this work is that it can be applied in a larger scope (such as the country), with more variables, and for other diseases treated by hospitals. It has public policy implications; hospitals can be located through such data-driven analysis.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"6582-6588"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12241303/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144610560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2025-01-10DOI: 10.1109/bibm62325.2024.10822002
Shixue Sun, Rosemary Mejia, An N Dang Do, Qian Zhu
Juvenile neuronal ceroid lipofuscinosis (CLN3) is a rare neurodegenerative disorder lacking effective therapies. This study aimed at developing a drug repurposing approach to identify potential therapeutic candidates for CLN3 using its protein expression profile (CPEP) constructed from proteomics data. Differentially expressed proteins were identified and applied to query the iLINCS database, resulting in 60 FDA-approved drugs with reversal effects on CPEP. These candidates were further prioritized based on regulation strength, coverage, and blood-brain barrier permeability. Top candidates include Vorinostat and Cyclosporine, which have shown promise due to their significant regulation scores and blood-brain barrier permeation probability. These results provide opportunities for further investigation on novel therapies for CLN3.
{"title":"Identifying Drug Repurposing Candidates for CLN3 Targeting Proteomics Expression Profile.","authors":"Shixue Sun, Rosemary Mejia, An N Dang Do, Qian Zhu","doi":"10.1109/bibm62325.2024.10822002","DOIUrl":"10.1109/bibm62325.2024.10822002","url":null,"abstract":"<p><p>Juvenile neuronal ceroid lipofuscinosis (CLN3) is a rare neurodegenerative disorder lacking effective therapies. This study aimed at developing a drug repurposing approach to identify potential therapeutic candidates for CLN3 using its protein expression profile (CPEP) constructed from proteomics data. Differentially expressed proteins were identified and applied to query the iLINCS database, resulting in 60 FDA-approved drugs with reversal effects on CPEP. These candidates were further prioritized based on regulation strength, coverage, and blood-brain barrier permeability. Top candidates include Vorinostat and Cyclosporine, which have shown promise due to their significant regulation scores and blood-brain barrier permeation probability. These results provide opportunities for further investigation on novel therapies for CLN3.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"4572-4574"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The global decline in HIV incidence has not been mirrored in the United States, where young adults (ages 18-29) continue to account for a significant portion of new infections. In this study, we leverage the All of Us (AoU) Research Program's extensive electronic health records (EHRs) and health survey data to develop machine learning models capable of predicting HIV diagnoses at least three months before clinical identification. Among various models tested, the Support Vector Machine (SVM) model demonstrated a balanced performance, integrating clinically relevant features with robust predictive accuracy (AUC = 0.91). Risky drinking behaviors emerged as consistent top predictors across models, highlighting the importance of targeted interventions in this age group. Our findings underscore the potential of predictive analytics in enhancing HIV prevention strategies and informing public health efforts aimed at reducing HIV transmission among emerging adults.
{"title":"Predicting HIV Diagnosis Among Emerging Adults Using Electronic Health Records and Health Survey Data in All of Us Research Program.","authors":"Balu Bhasuran, Yiyang Liu, Mattia Prosperi, Karen MacDonell, Sylvie Naar, Zhe He","doi":"10.1109/bibm62325.2024.10822296","DOIUrl":"10.1109/bibm62325.2024.10822296","url":null,"abstract":"<p><p>The global decline in HIV incidence has not been mirrored in the United States, where young adults (ages 18-29) continue to account for a significant portion of new infections. In this study, we leverage the All of Us (AoU) Research Program's extensive electronic health records (EHRs) and health survey data to develop machine learning models capable of predicting HIV diagnoses at least three months before clinical identification. Among various models tested, the Support Vector Machine (SVM) model demonstrated a balanced performance, integrating clinically relevant features with robust predictive accuracy (AUC = 0.91). Risky drinking behaviors emerged as consistent top predictors across models, highlighting the importance of targeted interventions in this age group. Our findings underscore the potential of predictive analytics in enhancing HIV prevention strategies and informing public health efforts aimed at reducing HIV transmission among emerging adults.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"5433-5440"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143415967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01DOI: 10.1109/bibm62325.2024.10822848
Arman Behnam, Muskan Garg, Xingyi Liu, Maria Vassilaki, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn
Mild Cognitive Impairment (MCI) is a transitional stage between normal cognitive aging and dementia. Some individuals with MCI revert to normal, while others progress to dementia. There are limited studies using explainable artificial intelligence on longitudinal data, particularly including genotypes, biomarkers and chronic diseases, to explore these differences. This study introduces a novel approach to understanding MCI progression using explainable graph neural networks. Utilizing longitudinal temporal data, we constructed a comprehensive graph representation of each individual in the study cohort. Our temporal graph convolutional network achieved 72.4% accuracy in predicting MCI transitions, while our causal explanation method outperformed existing explanation techniques in stability, accuracy, and faithfulness. We identified a causal subgraph with informative variables including hypertension, arrhythmia, congestive heart failure, coronary artery disease, stroke, lipid-related issues, and sex.
{"title":"Causal Explanation from Mild Cognitive Impairment Progression using Graph Neural Networks.","authors":"Arman Behnam, Muskan Garg, Xingyi Liu, Maria Vassilaki, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn","doi":"10.1109/bibm62325.2024.10822848","DOIUrl":"10.1109/bibm62325.2024.10822848","url":null,"abstract":"<p><p>Mild Cognitive Impairment (MCI) is a transitional stage between normal cognitive aging and dementia. Some individuals with MCI revert to normal, while others progress to dementia. There are limited studies using explainable artificial intelligence on longitudinal data, particularly including genotypes, biomarkers and chronic diseases, to explore these differences. This study introduces a novel approach to understanding MCI progression using explainable graph neural networks. Utilizing longitudinal temporal data, we constructed a comprehensive graph representation of each individual in the study cohort. Our temporal graph convolutional network achieved 72.4% accuracy in predicting MCI transitions, while our causal explanation method outperformed existing explanation techniques in stability, accuracy, and faithfulness. We identified a causal subgraph with informative variables including hypertension, arrhythmia, congestive heart failure, coronary artery disease, stroke, lipid-related issues, and sex.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"6349-6355"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11803575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143384106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2025-01-10DOI: 10.1109/bibm62325.2024.10822513
Jaber Valinejad, Yanji Xu, Qian Zhu
Rare diseases affect fewer than 200,000 individuals in the United States, with some being so rare that only a handful of people are impacted. According to the U.S. Food and Drug Administration (FDA), there are 1,268 approved orphan drugs available for treating these conditions. However, potentially beneficial drugs can also have side effects. Some adverse events, while serious, may be rare, making them difficult to identify or quantify in randomized controlled trials. Understanding these events is critical for improving patient safety and treatment outcomes. To better assess these risks, we aimed at summarizing adverse drug events for rare diseases by utilizing FDA Adverse Event Reporting System (FAERS). This study offers a foundation for future research of improving drug safety in rare diseases.
{"title":"An application of studying FAERS data to Enhance Drug Safety and Treatment Outcomes in Rare Diseases.","authors":"Jaber Valinejad, Yanji Xu, Qian Zhu","doi":"10.1109/bibm62325.2024.10822513","DOIUrl":"10.1109/bibm62325.2024.10822513","url":null,"abstract":"<p><p>Rare diseases affect fewer than 200,000 individuals in the United States, with some being so rare that only a handful of people are impacted. According to the U.S. Food and Drug Administration (FDA), there are 1,268 approved orphan drugs available for treating these conditions. However, potentially beneficial drugs can also have side effects. Some adverse events, while serious, may be rare, making them difficult to identify or quantify in randomized controlled trials. Understanding these events is critical for improving patient safety and treatment outcomes. To better assess these risks, we aimed at summarizing adverse drug events for rare diseases by utilizing FDA Adverse Event Reporting System (FAERS). This study offers a foundation for future research of improving drug safety in rare diseases.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"4575-4578"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12419809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145042532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lung cancer remains a predominant cause of cancer-related deaths, with notable disparities in incidence and outcomes across racial and gender groups. This study addresses these disparities by developing a computational framework leveraging explainable artificial intelligence (XAI) to identify both patient- and cohort-specific biomarker genes in lung cancer. Specifically, we focus on two lung cancer subtypes, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC), examining distinct racial and sex-specific cohorts: African American males (AAMs) and European American males (EAMs). This study innovatively structures classification tasks based on disease conditions rather than racial labels to avoid race-specific imbalance. We constructed four classification tasks- one three-class problem (LUAD-LUSC-HEALTHY) and three two-class problems (LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY)- to interpret the disease behavior of the patients in terms of genes and pathways. This methodology allows a LUAD or LUSC patient to be analyzed via multiple classifications, yielding robust disparity information for every patient. This preliminary work reports the disparity information for LUAD only. Utilizing Transcriptome data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects, we processed samples for LUAD, LUSC, and HEALTHY cohorts. We applied machine learning models, including convolutional neural network (CNN), logistic regression (LR), naïve Bayesian classifier (NB), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) for the classification. The SHapley Additive exPlanation (SHAP)-based interpretation of the best performing classification model uncovered cohort-specific genes and pathways related to health disparities between LUAD-AAM and LUAD-EAM cohorts.
{"title":"Interpreting Lung Cancer Health Disparity between African American Males and European American Males.","authors":"Masrur Sobhan, Md Mezbahul Islam, Ananda Mohan Mondal","doi":"10.1109/bibm62325.2024.10822014","DOIUrl":"10.1109/bibm62325.2024.10822014","url":null,"abstract":"<p><p>Lung cancer remains a predominant cause of cancer-related deaths, with notable disparities in incidence and outcomes across racial and gender groups. This study addresses these disparities by developing a computational framework leveraging explainable artificial intelligence (XAI) to identify both patient- and cohort-specific biomarker genes in lung cancer. Specifically, we focus on two lung cancer subtypes, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC), examining distinct racial and sex-specific cohorts: African American males (AAMs) and European American males (EAMs). This study innovatively structures classification tasks based on disease conditions rather than racial labels to avoid race-specific imbalance. We constructed four classification tasks- one three-class problem (LUAD-LUSC-HEALTHY) and three two-class problems (LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY)- to interpret the disease behavior of the patients in terms of genes and pathways. This methodology allows a LUAD or LUSC patient to be analyzed via multiple classifications, yielding robust disparity information for every patient. This preliminary work reports the disparity information for LUAD only. Utilizing Transcriptome data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects, we processed samples for LUAD, LUSC, and HEALTHY cohorts. We applied machine learning models, including convolutional neural network (CNN), logistic regression (LR), naïve Bayesian classifier (NB), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) for the classification. The SHapley Additive exPlanation (SHAP)-based interpretation of the best performing classification model uncovered cohort-specific genes and pathways related to health disparities between LUAD-AAM and LUAD-EAM cohorts.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"7141-7143"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mammogram image analysis has benefited from advancements in artificial intelligence (AI), particularly through the use of Siamese networks, which, similar to radiologists, compare current and prior mammogram images to enhance diagnostic accuracy. One of the main challenges in employing Siamese networks for this purpose is selecting an effective distance function. Given the complexity of mammogram images and the high correlation between current and prior images, traditional distance functions in Siamese networks often fall short in capturing the subtle, non-linear differences between these correlated features. This study explores the impact of incorporating non-linear and correlation-sensitive distance functions within a Siamese network framework for analyzing paired mammogram images. We benchmarked different distance functions, including Euclidean, Manhattan, Mahalanobis, Radial Basis Function (RBF), and cosine, and introduced a novel combination of RBF with Matern Covariance. Our evaluation revealed that the RBF with Matern Covariance consistently outperformed other functions, emphasizing the importance of addressing non-linearity and correlation in this context. For instance, the ResNet50 model, when paired with this distance function, achieved an accuracy of 0.938, sensitivity of 0.921, precision of 0.955, specificity of 0.958, F1 score of 0.930, and AUC of 0.940. We observed similarly strong performance across other models as well. Furthermore, the robustness of our approach was confirmed through evaluation on a dataset of 30 cross-validation samples, demonstrating its generalizability. These findings underscore the effectiveness of non-linear and correlation-based distance functions in Siamese networks for improving the performance and generalization of mammogram image analysis. All codes used in this paper are available at https://github.com/NabaviLab/Benchmarking_Distance_Functions_in_Siamese_Networks.
{"title":"Benchmarking Distance Functions in Siamese Networks for Current and Prior Mammogram Image Analysis.","authors":"Sahand Hamzehei, Afsana Ahsan Jeny, Annie Jin, Clifford Yang, Sheida Nabavi","doi":"10.1109/bibm62325.2024.10822291","DOIUrl":"10.1109/bibm62325.2024.10822291","url":null,"abstract":"<p><p>Mammogram image analysis has benefited from advancements in artificial intelligence (AI), particularly through the use of Siamese networks, which, similar to radiologists, compare current and prior mammogram images to enhance diagnostic accuracy. One of the main challenges in employing Siamese networks for this purpose is selecting an effective distance function. Given the complexity of mammogram images and the high correlation between current and prior images, traditional distance functions in Siamese networks often fall short in capturing the subtle, non-linear differences between these correlated features. This study explores the impact of incorporating non-linear and correlation-sensitive distance functions within a Siamese network framework for analyzing paired mammogram images. We benchmarked different distance functions, including Euclidean, Manhattan, Mahalanobis, Radial Basis Function (RBF), and cosine, and introduced a novel combination of RBF with Matern Covariance. Our evaluation revealed that the RBF with Matern Covariance consistently outperformed other functions, emphasizing the importance of addressing non-linearity and correlation in this context. For instance, the ResNet50 model, when paired with this distance function, achieved an accuracy of 0.938, sensitivity of 0.921, precision of 0.955, specificity of 0.958, F1 score of 0.930, and AUC of 0.940. We observed similarly strong performance across other models as well. Furthermore, the robustness of our approach was confirmed through evaluation on a dataset of 30 cross-validation samples, demonstrating its generalizability. These findings underscore the effectiveness of non-linear and correlation-based distance functions in Siamese networks for improving the performance and generalization of mammogram image analysis. All codes used in this paper are available at https://github.com/NabaviLab/Benchmarking_Distance_Functions_in_Siamese_Networks.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"1996-2003"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12250141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144628015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning (DL) has transformed medical image classification; however, its efficacy is often limited by significant data imbalance due to far fewer cases (minority class) compared to controls (majority class). It has been shown that synthetic image augmentation techniques can simulate clinical variability, leading to enhanced model performance. We hypothesize that they could also mitigate the challenge of data imbalance, thereby addressing overfitting to the majority class and enhancing generalization. Recently, latent diffusion models (LDMs) have shown promise in synthesizing high-quality medical images. This study evaluates the effectiveness of a text-guided image-to-image LDM in synthesizing disease-positive chest X-rays (CXRs) and augmenting a pediatric CXR dataset to improve classification performance. We first establish baseline performance by fine-tuning an ImageNet-pretrained Inception-V3 model on class-imbalanced data for two tasks-normal vs. pneumonia and normal vs. bronchopneumonia. Next, we fine-tune individual text-guided image-to-image LDMs to generate CXRs showing signs of pneumonia and bronchopneumonia. The Inception-V3 model is retrained on an updated data set that includes these synthesized images as part of augmented training and validation sets. Classification performance is compared using balanced accuracy, sensitivity, specificity, F-score, Matthews correlation coefficient (MCC), Kappa, and Youden's index against the baseline performance. Results show that the augmentation significantly improves Youden's index (p<0.05) and markedly enhances other metrics, indicating that data augmentation using LDM-synthesized images is an effective strategy for addressing class imbalance in medical image classification.
{"title":"Addressing Class Imbalance with Latent Diffusion-based Data Augmentation for Improving Disease Classification in Pediatric Chest X-rays.","authors":"Sivaramakrishnan Rajaraman, Zhaohui Liang, Zhiyun Xue, Sameer Antani","doi":"10.1109/bibm62325.2024.10822172","DOIUrl":"10.1109/bibm62325.2024.10822172","url":null,"abstract":"<p><p>Deep learning (DL) has transformed medical image classification; however, its efficacy is often limited by significant data imbalance due to far fewer cases (minority class) compared to controls (majority class). It has been shown that synthetic image augmentation techniques can simulate clinical variability, leading to enhanced model performance. We hypothesize that they could also mitigate the challenge of data imbalance, thereby addressing overfitting to the majority class and enhancing generalization. Recently, latent diffusion models (LDMs) have shown promise in synthesizing high-quality medical images. This study evaluates the effectiveness of a text-guided image-to-image LDM in synthesizing disease-positive chest X-rays (CXRs) and augmenting a pediatric CXR dataset to improve classification performance. We first establish baseline performance by fine-tuning an ImageNet-pretrained Inception-V3 model on class-imbalanced data for two tasks-normal vs. pneumonia and normal vs. bronchopneumonia. Next, we fine-tune individual text-guided image-to-image LDMs to generate CXRs showing signs of pneumonia and bronchopneumonia. The Inception-V3 model is retrained on an updated data set that includes these synthesized images as part of augmented training and validation sets. Classification performance is compared using balanced accuracy, sensitivity, specificity, F-score, Matthews correlation coefficient (MCC), Kappa, and Youden's index against the baseline performance. Results show that the augmentation significantly improves Youden's index (p<0.05) and markedly enhances other metrics, indicating that data augmentation using LDM-synthesized images is an effective strategy for addressing class imbalance in medical image classification.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"5059-5066"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11936509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01DOI: 10.1109/bibm62325.2024.10822585
Ratri Mukherjee, Kishlay Jha
Biomedical text classification refers to the task of annotating a biomedical text with its relevant labels from a candidate label set. Most of the existing approach operate in a fully supervised setting and thus heavily rely on human-annotated training data which is both labor-intensive and monetarily expensive. To address this, we propose to formulate biomedical text classification under the zero-shot learning (ZSL) paradigm that does not require any labeled training data and only relies on label surface names for training and inference. Specifically, we propose a new context-aware contrastive learning technique for ZSL that fully exploits the context information present in the biomedical text to generate semantically enriched feature representations needed for accurate zero-shot biomedical text classification. Unlike existing contrastive learning approaches that typically employ random text segmentation strategies to generate contrastive pairs, our approach utilizes the context information inherently present in biomedical text to generate semantically meaningful contrastive pairs. Extensive experiments on the largest available biomedical corpus validates the effectiveness of the proposed approach.
{"title":"Context-Aware Contrastive Representation Learning for Zero-Shot Biomedical Text Classification.","authors":"Ratri Mukherjee, Kishlay Jha","doi":"10.1109/bibm62325.2024.10822585","DOIUrl":"10.1109/bibm62325.2024.10822585","url":null,"abstract":"<p><p>Biomedical text classification refers to the task of annotating a biomedical text with its relevant labels from a candidate label set. Most of the existing approach operate in a fully supervised setting and thus heavily rely on human-annotated training data which is both labor-intensive and monetarily expensive. To address this, we propose to formulate biomedical text classification under the zero-shot learning (ZSL) paradigm that does not require any labeled training data and only relies on label surface names for training and inference. Specifically, we propose a new context-aware contrastive learning technique for ZSL that fully exploits the context information present in the biomedical text to generate semantically enriched feature representations needed for accurate zero-shot biomedical text classification. Unlike existing contrastive learning approaches that typically employ random text segmentation strategies to generate contrastive pairs, our approach utilizes the context information inherently present in biomedical text to generate semantically meaningful contrastive pairs. Extensive experiments on the largest available biomedical corpus validates the effectiveness of the proposed approach.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"3611-3614"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11916847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143659972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}