Proceedings. IEEE International Conference on Bioinformatics and Biomedicine最新文献

Aligning Orphanet Classification to Identify Disease Characteristics among Rare Disease Clusters. 调整孤儿分类以识别罕见疾病群中的疾病特征。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 Epub Date: 2025-01-10 DOI: 10.1109/bibm62325.2024.10822379

Sungrim Moon, Jessica Maine, Ewy Mathe, Qian Zhu

Understanding the underlying etiologies of rare diseases may facilitate research across multiple conditions, enabling basket trail design and drug repurposing. In this study, we aligned clusters of rare diseases with Orphanet classifications to represent their shared etiologies and establish a foundation for further investigation on underly biological mechanism discovery. By utilizing the linearized Orphanet categories, we connected 35 clusters of rare diseases into 18 classifications. Significant associations were found between the categories "Rare Developmental Defects During Embryogenesis" and "Rare Inborn Errors of Metabolism" and the clusters in this study, suggesting that many rare diseases originating in the prenatal period or related to metabolism may present a substantial opportunity for success in future investigation.

了解罕见病的潜在病因可以促进多种疾病的研究，使篮子试验设计和药物再利用成为可能。在这项研究中，我们将罕见病的聚类与Orphanet分类相结合，以代表它们共同的病因，并为进一步研究潜在的生物学机制发现奠定基础。利用线性化的Orphanet分类，我们将35个罕见病群划分为18个类别。在本研究中，“罕见的胚胎发育缺陷”和“罕见的先天性代谢错误”这两个类别与本研究的聚类之间发现了显著的关联，这表明许多起源于产前或与代谢相关的罕见疾病可能为未来的研究提供了巨大的成功机会。

引用次数: 0

A New Metric for Measuring Locational Health Access for Cancer Treatment. 一种衡量癌症治疗地点卫生可及性的新指标。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 DOI: 10.1109/BIBM62325.2024.10822220

Subhajit Chakrabarty, Udaysinh Rathod, Sweta Singh, Debarshi Roy, Ismael Maya

Ensuring access to cancer treatment facilities is essential for delivering timely care, yet various barriers such as geographic distance, socioeconomic factors, and social disparities can impede access in rural and urban regions. This study measured locational health access for colorectal cancer in the context of hospitals and population distribution in Louisiana. It used data of census tracts, hospital beds and providers, from the National Cancer Institute. By mapping the distribution of these healthcare facilities, the study revealed the potential of identifying significant challenges in accessing specialized cancer care. There is no existing locational health access metric in this domain. The contribution of this paper is that it meticulously calculated the actual road distance of each census tract centroid and each cancer-treating hospital, and offers a new locational health access metric. This metric considers the number of beds and number of oncologists, as a proxy for measurement of cancer treatment facilities. The significance of this work is that it can be applied in a larger scope (such as the country), with more variables, and for other diseases treated by hospitals. It has public policy implications; hospitals can be located through such data-driven analysis.

确保获得癌症治疗设施对于及时提供护理至关重要，但地理距离、社会经济因素和社会差异等各种障碍可能阻碍农村和城市地区获得治疗。本研究在路易斯安那州的医院和人口分布的背景下测量了结直肠癌的位置健康可及性。它使用了来自国家癌症研究所的人口普查区、医院床位和医疗服务提供者的数据。通过绘制这些医疗机构的分布图，该研究揭示了在获得专业癌症护理方面确定重大挑战的潜力。此域中没有现有的位置运行状况访问度量。本文的贡献在于细致地计算了每个普查区质心和每个癌症治疗医院的实际道路距离，并提供了一个新的位置健康访问度量。该指标考虑了病床数量和肿瘤学家数量，作为衡量癌症治疗设施的代理。这项工作的意义在于，它可以应用于更大的范围（如国家），具有更多的变量，并且可以用于医院治疗的其他疾病。它对公共政策有影响；通过这种数据驱动的分析，可以定位医院。

{"title":"A New Metric for Measuring Locational Health Access for Cancer Treatment.","authors":"Subhajit Chakrabarty, Udaysinh Rathod, Sweta Singh, Debarshi Roy, Ismael Maya","doi":"10.1109/BIBM62325.2024.10822220","DOIUrl":"10.1109/BIBM62325.2024.10822220","url":null,"abstract":"Ensuring access to cancer treatment facilities is essential for delivering timely care, yet various barriers such as geographic distance, socioeconomic factors, and social disparities can impede access in rural and urban regions. This study measured locational health access for colorectal cancer in the context of hospitals and population distribution in Louisiana. It used data of census tracts, hospital beds and providers, from the National Cancer Institute. By mapping the distribution of these healthcare facilities, the study revealed the potential of identifying significant challenges in accessing specialized cancer care. There is no existing locational health access metric in this domain. The contribution of this paper is that it meticulously calculated the actual road distance of each census tract centroid and each cancer-treating hospital, and offers a new locational health access metric. This metric considers the number of beds and number of oncologists, as a proxy for measurement of cancer treatment facilities. The significance of this work is that it can be applied in a larger scope (such as the country), with more variables, and for other diseases treated by hospitals. It has public policy implications; hospitals can be located through such data-driven analysis.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"6582-6588"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12241303/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144610560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying Drug Repurposing Candidates for CLN3 Targeting Proteomics Expression Profile. 确定CLN3靶向蛋白质组学表达谱的药物再利用候选药物。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 Epub Date: 2025-01-10 DOI: 10.1109/bibm62325.2024.10822002

Shixue Sun, Rosemary Mejia, An N Dang Do, Qian Zhu

Juvenile neuronal ceroid lipofuscinosis (CLN3) is a rare neurodegenerative disorder lacking effective therapies. This study aimed at developing a drug repurposing approach to identify potential therapeutic candidates for CLN3 using its protein expression profile (CPEP) constructed from proteomics data. Differentially expressed proteins were identified and applied to query the iLINCS database, resulting in 60 FDA-approved drugs with reversal effects on CPEP. These candidates were further prioritized based on regulation strength, coverage, and blood-brain barrier permeability. Top candidates include Vorinostat and Cyclosporine, which have shown promise due to their significant regulation scores and blood-brain barrier permeation probability. These results provide opportunities for further investigation on novel therapies for CLN3.

幼年神经性蜡样脂褐质病是一种罕见的神经退行性疾病，缺乏有效的治疗方法。本研究旨在开发一种药物再利用方法，利用基于蛋白质组学数据构建的CLN3蛋白表达谱（CPEP）来鉴定潜在的治疗候选药物。鉴别出差异表达蛋白，并应用于查询iLINCS数据库，得到60种fda批准的CPEP逆转药物。根据调节强度、覆盖范围和血脑屏障通透性，进一步对这些候选药物进行优先排序。最热门的候选药物包括伏立诺他和环孢素，由于其显著的调节评分和血脑屏障渗透概率，它们已显示出前景。这些结果为进一步研究CLN3的新疗法提供了机会。

引用次数: 0

Predicting HIV Diagnosis Among Emerging Adults Using Electronic Health Records and Health Survey Data in All of Us Research Program. 在我们所有人的研究项目中，使用电子健康记录和健康调查数据预测新兴成年人的艾滋病毒诊断。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 Epub Date: 2025-01-10 DOI: 10.1109/bibm62325.2024.10822296

Balu Bhasuran, Yiyang Liu, Mattia Prosperi, Karen MacDonell, Sylvie Naar, Zhe He

The global decline in HIV incidence has not been mirrored in the United States, where young adults (ages 18-29) continue to account for a significant portion of new infections. In this study, we leverage the All of Us (AoU) Research Program's extensive electronic health records (EHRs) and health survey data to develop machine learning models capable of predicting HIV diagnoses at least three months before clinical identification. Among various models tested, the Support Vector Machine (SVM) model demonstrated a balanced performance, integrating clinically relevant features with robust predictive accuracy (AUC = 0.91). Risky drinking behaviors emerged as consistent top predictors across models, highlighting the importance of targeted interventions in this age group. Our findings underscore the potential of predictive analytics in enhancing HIV prevention strategies and informing public health efforts aimed at reducing HIV transmission among emerging adults.

全球艾滋病发病率的下降并没有反映在美国，在美国，年轻人（18-29岁）仍然占新感染的很大一部分。在这项研究中，我们利用我们所有人（AoU）研究计划的广泛电子健康记录（EHRs）和健康调查数据来开发能够在临床鉴定前至少三个月预测HIV诊断的机器学习模型。在测试的各种模型中，支持向量机（SVM）模型表现出平衡的性能，整合了临床相关特征和稳健的预测精度（AUC = 0.91）。危险的饮酒行为在所有模型中都是一致的最高预测因素，突出了在这个年龄组进行有针对性干预的重要性。我们的研究结果强调了预测分析在加强艾滋病毒预防策略和为旨在减少艾滋病毒在新生成人中的传播的公共卫生工作提供信息方面的潜力。

{"title":"Predicting HIV Diagnosis Among Emerging Adults Using Electronic Health Records and Health Survey Data in All of Us Research Program.","authors":"Balu Bhasuran, Yiyang Liu, Mattia Prosperi, Karen MacDonell, Sylvie Naar, Zhe He","doi":"10.1109/bibm62325.2024.10822296","DOIUrl":"10.1109/bibm62325.2024.10822296","url":null,"abstract":"The global decline in HIV incidence has not been mirrored in the United States, where young adults (ages 18-29) continue to account for a significant portion of new infections. In this study, we leverage the All of Us (AoU) Research Program's extensive electronic health records (EHRs) and health survey data to develop machine learning models capable of predicting HIV diagnoses at least three months before clinical identification. Among various models tested, the Support Vector Machine (SVM) model demonstrated a balanced performance, integrating clinically relevant features with robust predictive accuracy (AUC = 0.91). Risky drinking behaviors emerged as consistent top predictors across models, highlighting the importance of targeted interventions in this age group. Our findings underscore the potential of predictive analytics in enhancing HIV prevention strategies and informing public health efforts aimed at reducing HIV transmission among emerging adults.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"5433-5440"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143415967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal Explanation from Mild Cognitive Impairment Progression using Graph Neural Networks. 用图神经网络解释轻度认知障碍进展的因果关系。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 DOI: 10.1109/bibm62325.2024.10822848

Arman Behnam, Muskan Garg, Xingyi Liu, Maria Vassilaki, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn

Mild Cognitive Impairment (MCI) is a transitional stage between normal cognitive aging and dementia. Some individuals with MCI revert to normal, while others progress to dementia. There are limited studies using explainable artificial intelligence on longitudinal data, particularly including genotypes, biomarkers and chronic diseases, to explore these differences. This study introduces a novel approach to understanding MCI progression using explainable graph neural networks. Utilizing longitudinal temporal data, we constructed a comprehensive graph representation of each individual in the study cohort. Our temporal graph convolutional network achieved 72.4% accuracy in predicting MCI transitions, while our causal explanation method outperformed existing explanation techniques in stability, accuracy, and faithfulness. We identified a causal subgraph with informative variables including hypertension, arrhythmia, congestive heart failure, coronary artery disease, stroke, lipid-related issues, and sex.

轻度认知损伤（Mild Cognitive Impairment， MCI）是介于正常认知老化和痴呆之间的过渡阶段。一些轻度认知障碍患者会恢复正常，而另一些则会发展为痴呆症。利用可解释的人工智能对纵向数据（特别是包括基因型、生物标志物和慢性疾病）进行的研究有限，以探索这些差异。本研究介绍了一种使用可解释图神经网络来理解MCI进展的新方法。利用纵向时间数据，我们构建了研究队列中每个个体的综合图表。我们的时间图卷积网络在预测MCI转变方面达到了72.4%的准确率，而我们的因果解释方法在稳定性、准确性和可信度方面优于现有的解释技术。我们确定了一个包含信息变量的因果子图，包括高血压、心律失常、充血性心力衰竭、冠状动脉疾病、中风、脂质相关问题和性别。

{"title":"Causal Explanation from Mild Cognitive Impairment Progression using Graph Neural Networks.","authors":"Arman Behnam, Muskan Garg, Xingyi Liu, Maria Vassilaki, Jennifer St Sauver, Ronald C Petersen, Sunghwan Sohn","doi":"10.1109/bibm62325.2024.10822848","DOIUrl":"10.1109/bibm62325.2024.10822848","url":null,"abstract":"Mild Cognitive Impairment (MCI) is a transitional stage between normal cognitive aging and dementia. Some individuals with MCI revert to normal, while others progress to dementia. There are limited studies using explainable artificial intelligence on longitudinal data, particularly including genotypes, biomarkers and chronic diseases, to explore these differences. This study introduces a novel approach to understanding MCI progression using explainable graph neural networks. Utilizing longitudinal temporal data, we constructed a comprehensive graph representation of each individual in the study cohort. Our temporal graph convolutional network achieved 72.4% accuracy in predicting MCI transitions, while our causal explanation method outperformed existing explanation techniques in stability, accuracy, and faithfulness. We identified a causal subgraph with informative variables including hypertension, arrhythmia, congestive heart failure, coronary artery disease, stroke, lipid-related issues, and sex.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"6349-6355"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11803575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143384106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An application of studying FAERS data to Enhance Drug Safety and Treatment Outcomes in Rare Diseases. 研究FAERS数据在提高罕见病药物安全性和治疗效果中的应用

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 Epub Date: 2025-01-10 DOI: 10.1109/bibm62325.2024.10822513

Jaber Valinejad, Yanji Xu, Qian Zhu

Rare diseases affect fewer than 200,000 individuals in the United States, with some being so rare that only a handful of people are impacted. According to the U.S. Food and Drug Administration (FDA), there are 1,268 approved orphan drugs available for treating these conditions. However, potentially beneficial drugs can also have side effects. Some adverse events, while serious, may be rare, making them difficult to identify or quantify in randomized controlled trials. Understanding these events is critical for improving patient safety and treatment outcomes. To better assess these risks, we aimed at summarizing adverse drug events for rare diseases by utilizing FDA Adverse Event Reporting System (FAERS). This study offers a foundation for future research of improving drug safety in rare diseases.

在美国，罕见病影响不到20万人，其中一些非常罕见，只有少数人受到影响。根据美国食品和药物管理局（FDA）的数据，有1268种经批准的孤儿药可用于治疗这些疾病。然而，潜在的有益药物也可能有副作用。有些不良事件虽然严重，但可能很罕见，这使得它们难以在随机对照试验中识别或量化。了解这些事件对于改善患者安全和治疗结果至关重要。为了更好地评估这些风险，我们旨在利用FDA不良事件报告系统（FAERS）总结罕见病的药物不良事件。本研究为今后提高罕见病药物安全性的研究奠定了基础。

引用次数: 0

Interpreting Lung Cancer Health Disparity between African American Males and European American Males. 非裔美国男性和欧裔美国男性肺癌健康差异的解释

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 DOI: 10.1109/bibm62325.2024.10822014

Masrur Sobhan, Md Mezbahul Islam, Ananda Mohan Mondal

Lung cancer remains a predominant cause of cancer-related deaths, with notable disparities in incidence and outcomes across racial and gender groups. This study addresses these disparities by developing a computational framework leveraging explainable artificial intelligence (XAI) to identify both patient- and cohort-specific biomarker genes in lung cancer. Specifically, we focus on two lung cancer subtypes, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC), examining distinct racial and sex-specific cohorts: African American males (AAMs) and European American males (EAMs). This study innovatively structures classification tasks based on disease conditions rather than racial labels to avoid race-specific imbalance. We constructed four classification tasks- one three-class problem (LUAD-LUSC-HEALTHY) and three two-class problems (LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY)- to interpret the disease behavior of the patients in terms of genes and pathways. This methodology allows a LUAD or LUSC patient to be analyzed via multiple classifications, yielding robust disparity information for every patient. This preliminary work reports the disparity information for LUAD only. Utilizing Transcriptome data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects, we processed samples for LUAD, LUSC, and HEALTHY cohorts. We applied machine learning models, including convolutional neural network (CNN), logistic regression (LR), naïve Bayesian classifier (NB), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) for the classification. The SHapley Additive exPlanation (SHAP)-based interpretation of the best performing classification model uncovered cohort-specific genes and pathways related to health disparities between LUAD-AAM and LUAD-EAM cohorts.

肺癌仍然是癌症相关死亡的主要原因，不同种族和性别群体在发病率和结局方面存在显著差异。本研究通过开发利用可解释人工智能（XAI）的计算框架来识别肺癌患者和队列特异性生物标志物基因，从而解决了这些差异。具体来说，我们关注两种肺癌亚型，肺腺癌（LUAD）和肺鳞状细胞癌（LUSC），检查不同的种族和性别特异性队列：非洲裔美国男性（AAMs）和欧洲裔美国男性（EAMs）。本研究创新地根据疾病状况而不是种族标签来构建分类任务，以避免种族特异性失衡。我们构建了四个分类任务，一个三级问题（LUAD-LUSC- healthy）和三个二级问题（LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY），从基因和途径来解释患者的疾病行为。该方法允许通过多种分类分析LUAD或LUSC患者，为每个患者提供可靠的差异信息。这项初步工作仅报告了LUAD的差异信息。利用来自癌症基因组图谱（TCGA）和基因型-组织表达（GTEx）项目的转录组数据，我们处理了LUAD、LUSC和健康队列的样本。我们应用了机器学习模型，包括卷积神经网络（CNN）、逻辑回归（LR）、naïve贝叶斯分类器（NB）、支持向量机（SVM）、随机森林（RF）和极端梯度增强（XGBoost）进行分类。基于SHapley加性解释（SHAP）的最佳分类模型解释揭示了与LUAD-AAM和LUAD-EAM队列之间健康差异相关的队列特异性基因和途径。

{"title":"Interpreting Lung Cancer Health Disparity between African American Males and European American Males.","authors":"Masrur Sobhan, Md Mezbahul Islam, Ananda Mohan Mondal","doi":"10.1109/bibm62325.2024.10822014","DOIUrl":"10.1109/bibm62325.2024.10822014","url":null,"abstract":"Lung cancer remains a predominant cause of cancer-related deaths, with notable disparities in incidence and outcomes across racial and gender groups. This study addresses these disparities by developing a computational framework leveraging explainable artificial intelligence (XAI) to identify both patient- and cohort-specific biomarker genes in lung cancer. Specifically, we focus on two lung cancer subtypes, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC), examining distinct racial and sex-specific cohorts: African American males (AAMs) and European American males (EAMs). This study innovatively structures classification tasks based on disease conditions rather than racial labels to avoid race-specific imbalance. We constructed four classification tasks- one three-class problem (LUAD-LUSC-HEALTHY) and three two-class problems (LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY)- to interpret the disease behavior of the patients in terms of genes and pathways. This methodology allows a LUAD or LUSC patient to be analyzed via multiple classifications, yielding robust disparity information for every patient. This preliminary work reports the disparity information for LUAD only. Utilizing Transcriptome data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects, we processed samples for LUAD, LUSC, and HEALTHY cohorts. We applied machine learning models, including convolutional neural network (CNN), logistic regression (LR), naïve Bayesian classifier (NB), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) for the classification. The SHapley Additive exPlanation (SHAP)-based interpretation of the best performing classification model uncovered cohort-specific genes and pathways related to health disparities between LUAD-AAM and LUAD-EAM cohorts.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"7141-7143"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmarking Distance Functions in Siamese Networks for Current and Prior Mammogram Image Analysis. Siamese网络中用于当前和先前乳房x光图像分析的基准距离函数。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 DOI: 10.1109/bibm62325.2024.10822291

Sahand Hamzehei, Afsana Ahsan Jeny, Annie Jin, Clifford Yang, Sheida Nabavi

Mammogram image analysis has benefited from advancements in artificial intelligence (AI), particularly through the use of Siamese networks, which, similar to radiologists, compare current and prior mammogram images to enhance diagnostic accuracy. One of the main challenges in employing Siamese networks for this purpose is selecting an effective distance function. Given the complexity of mammogram images and the high correlation between current and prior images, traditional distance functions in Siamese networks often fall short in capturing the subtle, non-linear differences between these correlated features. This study explores the impact of incorporating non-linear and correlation-sensitive distance functions within a Siamese network framework for analyzing paired mammogram images. We benchmarked different distance functions, including Euclidean, Manhattan, Mahalanobis, Radial Basis Function (RBF), and cosine, and introduced a novel combination of RBF with Matern Covariance. Our evaluation revealed that the RBF with Matern Covariance consistently outperformed other functions, emphasizing the importance of addressing non-linearity and correlation in this context. For instance, the ResNet50 model, when paired with this distance function, achieved an accuracy of 0.938, sensitivity of 0.921, precision of 0.955, specificity of 0.958, F1 score of 0.930, and AUC of 0.940. We observed similarly strong performance across other models as well. Furthermore, the robustness of our approach was confirmed through evaluation on a dataset of 30 cross-validation samples, demonstrating its generalizability. These findings underscore the effectiveness of non-linear and correlation-based distance functions in Siamese networks for improving the performance and generalization of mammogram image analysis. All codes used in this paper are available at https://github.com/NabaviLab/Benchmarking_Distance_Functions_in_Siamese_Networks.

乳房x光片图像分析得益于人工智能（AI）的进步，特别是通过使用暹罗网络，该网络与放射科医生类似，比较当前和先前的乳房x光片图像以提高诊断准确性。为此目的使用暹罗网络的主要挑战之一是选择有效的距离函数。考虑到乳房x光图像的复杂性以及当前图像和先前图像之间的高度相关性，Siamese网络中的传统距离函数通常无法捕捉这些相关特征之间微妙的非线性差异。本研究探讨了在暹罗网络框架内纳入非线性和相关敏感距离函数的影响，用于分析配对乳房x线照片。我们对不同的距离函数进行基准测试，包括欧几里得、曼哈顿、马氏、径向基函数（RBF）和余弦函数，并引入了一种新的RBF与Matern协方差的组合。我们的评估显示，具有母协方差的RBF始终优于其他函数，强调了在这种情况下解决非线性和相关性的重要性。例如，ResNet50模型与该距离函数配对时，准确率为0.938，灵敏度为0.921，精度为0.955，特异性为0.958，F1评分为0.930，AUC为0.940。我们在其他模型中也观察到类似的强劲表现。此外，通过对30个交叉验证样本的数据集进行评估，证实了我们方法的稳健性，证明了它的通用性。这些发现强调了Siamese网络中非线性和基于相关的距离函数在提高乳房x光图像分析性能和泛化方面的有效性。本文中使用的所有代码都可以在https://github.com/NabaviLab/Benchmarking_Distance_Functions_in_Siamese_Networks上获得。

{"title":"Benchmarking Distance Functions in Siamese Networks for Current and Prior Mammogram Image Analysis.","authors":"Sahand Hamzehei, Afsana Ahsan Jeny, Annie Jin, Clifford Yang, Sheida Nabavi","doi":"10.1109/bibm62325.2024.10822291","DOIUrl":"10.1109/bibm62325.2024.10822291","url":null,"abstract":"Mammogram image analysis has benefited from advancements in artificial intelligence (AI), particularly through the use of Siamese networks, which, similar to radiologists, compare current and prior mammogram images to enhance diagnostic accuracy. One of the main challenges in employing Siamese networks for this purpose is selecting an effective distance function. Given the complexity of mammogram images and the high correlation between current and prior images, traditional distance functions in Siamese networks often fall short in capturing the subtle, non-linear differences between these correlated features. This study explores the impact of incorporating non-linear and correlation-sensitive distance functions within a Siamese network framework for analyzing paired mammogram images. We benchmarked different distance functions, including Euclidean, Manhattan, Mahalanobis, Radial Basis Function (RBF), and cosine, and introduced a novel combination of RBF with Matern Covariance. Our evaluation revealed that the RBF with Matern Covariance consistently outperformed other functions, emphasizing the importance of addressing non-linearity and correlation in this context. For instance, the ResNet50 model, when paired with this distance function, achieved an accuracy of 0.938, sensitivity of 0.921, precision of 0.955, specificity of 0.958, F1 score of 0.930, and AUC of 0.940. We observed similarly strong performance across other models as well. Furthermore, the robustness of our approach was confirmed through evaluation on a dataset of 30 cross-validation samples, demonstrating its generalizability. These findings underscore the effectiveness of non-linear and correlation-based distance functions in Siamese networks for improving the performance and generalization of mammogram image analysis. All codes used in this paper are available at https://github.com/NabaviLab/Benchmarking_Distance_Functions_in_Siamese_Networks.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"1996-2003"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12250141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144628015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing Class Imbalance with Latent Diffusion-based Data Augmentation for Improving Disease Classification in Pediatric Chest X-rays. 利用基于潜伏扩散的数据增强技术解决儿童胸部x线疾病分类中的类别不平衡问题。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 DOI: 10.1109/bibm62325.2024.10822172

Sivaramakrishnan Rajaraman, Zhaohui Liang, Zhiyun Xue, Sameer Antani

Deep learning (DL) has transformed medical image classification; however, its efficacy is often limited by significant data imbalance due to far fewer cases (minority class) compared to controls (majority class). It has been shown that synthetic image augmentation techniques can simulate clinical variability, leading to enhanced model performance. We hypothesize that they could also mitigate the challenge of data imbalance, thereby addressing overfitting to the majority class and enhancing generalization. Recently, latent diffusion models (LDMs) have shown promise in synthesizing high-quality medical images. This study evaluates the effectiveness of a text-guided image-to-image LDM in synthesizing disease-positive chest X-rays (CXRs) and augmenting a pediatric CXR dataset to improve classification performance. We first establish baseline performance by fine-tuning an ImageNet-pretrained Inception-V3 model on class-imbalanced data for two tasks-normal vs. pneumonia and normal vs. bronchopneumonia. Next, we fine-tune individual text-guided image-to-image LDMs to generate CXRs showing signs of pneumonia and bronchopneumonia. The Inception-V3 model is retrained on an updated data set that includes these synthesized images as part of augmented training and validation sets. Classification performance is compared using balanced accuracy, sensitivity, specificity, F-score, Matthews correlation coefficient (MCC), Kappa, and Youden's index against the baseline performance. Results show that the augmentation significantly improves Youden's index (p<0.05) and markedly enhances other metrics, indicating that data augmentation using LDM-synthesized images is an effective strategy for addressing class imbalance in medical image classification.

深度学习（DL）改变了医学图像分类；然而，由于病例（少数类）比对照组（多数类）少得多，其效果往往受到严重数据不平衡的限制。研究表明，合成图像增强技术可以模拟临床变异性，从而提高模型的性能。我们假设它们还可以减轻数据不平衡的挑战，从而解决对大多数类的过拟合并增强泛化。近年来，潜在扩散模型（latent diffusion models, ldm）在合成高质量医学图像方面表现出了良好的前景。本研究评估了文本引导的图像到图像LDM在合成疾病阳性胸部x射线（CXR）和增强儿科CXR数据集以提高分类性能方面的有效性。我们首先通过对两个任务（正常vs.肺炎和正常vs.支气管肺炎）的类别不平衡数据上的imagenet预训练的Inception-V3模型进行微调来建立基线性能。接下来，我们对单个文本引导的图像到图像ldm进行微调，以生成显示肺炎和支气管肺炎迹象的cxr。Inception-V3模型在更新的数据集上重新训练，该数据集包括这些合成图像，作为增强训练和验证集的一部分。使用平衡的准确性、敏感性、特异性、f评分、Matthews相关系数（MCC）、Kappa和Youden指数对基线性能进行分类性能比较。结果表明，隆胸术显著改善了约登氏指数(p

{"title":"Addressing Class Imbalance with Latent Diffusion-based Data Augmentation for Improving Disease Classification in Pediatric Chest X-rays.","authors":"Sivaramakrishnan Rajaraman, Zhaohui Liang, Zhiyun Xue, Sameer Antani","doi":"10.1109/bibm62325.2024.10822172","DOIUrl":"10.1109/bibm62325.2024.10822172","url":null,"abstract":"Deep learning (DL) has transformed medical image classification; however, its efficacy is often limited by significant data imbalance due to far fewer cases (minority class) compared to controls (majority class). It has been shown that synthetic image augmentation techniques can simulate clinical variability, leading to enhanced model performance. We hypothesize that they could also mitigate the challenge of data imbalance, thereby addressing overfitting to the majority class and enhancing generalization. Recently, latent diffusion models (LDMs) have shown promise in synthesizing high-quality medical images. This study evaluates the effectiveness of a text-guided image-to-image LDM in synthesizing disease-positive chest X-rays (CXRs) and augmenting a pediatric CXR dataset to improve classification performance. We first establish baseline performance by fine-tuning an ImageNet-pretrained Inception-V3 model on class-imbalanced data for two tasks-normal vs. pneumonia and normal vs. bronchopneumonia. Next, we fine-tune individual text-guided image-to-image LDMs to generate CXRs showing signs of pneumonia and bronchopneumonia. The Inception-V3 model is retrained on an updated data set that includes these synthesized images as part of augmented training and validation sets. Classification performance is compared using balanced accuracy, sensitivity, specificity, F-score, Matthews correlation coefficient (MCC), Kappa, and Youden's index against the baseline performance. Results show that the augmentation significantly improves Youden's index (p<0.05) and markedly enhances other metrics, indicating that data augmentation using LDM-synthesized images is an effective strategy for addressing class imbalance in medical image classification.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"5059-5066"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11936509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Context-Aware Contrastive Representation Learning for Zero-Shot Biomedical Text Classification. 基于上下文感知的生物医学文本分类对比表征学习。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2024-12-01 DOI: 10.1109/bibm62325.2024.10822585

Ratri Mukherjee, Kishlay Jha

Biomedical text classification refers to the task of annotating a biomedical text with its relevant labels from a candidate label set. Most of the existing approach operate in a fully supervised setting and thus heavily rely on human-annotated training data which is both labor-intensive and monetarily expensive. To address this, we propose to formulate biomedical text classification under the zero-shot learning (ZSL) paradigm that does not require any labeled training data and only relies on label surface names for training and inference. Specifically, we propose a new context-aware contrastive learning technique for ZSL that fully exploits the context information present in the biomedical text to generate semantically enriched feature representations needed for accurate zero-shot biomedical text classification. Unlike existing contrastive learning approaches that typically employ random text segmentation strategies to generate contrastive pairs, our approach utilizes the context information inherently present in biomedical text to generate semantically meaningful contrastive pairs. Extensive experiments on the largest available biomedical corpus validates the effectiveness of the proposed approach.

生物医学文本分类是指用候选标签集中的相关标签标注生物医学文本的任务。现有的大多数方法都是在完全监督的环境下运行的，因此严重依赖于人工注释的训练数据，这既是劳动密集型的，也是昂贵的。为了解决这个问题，我们提出在零射击学习（zero-shot learning， ZSL）范式下制定生物医学文本分类，该范式不需要任何标记的训练数据，仅依赖于标签表面名称进行训练和推理。具体来说，我们为ZSL提出了一种新的上下文感知对比学习技术，该技术充分利用生物医学文本中存在的上下文信息来生成准确的零采样生物医学文本分类所需的语义丰富的特征表示。现有的对比学习方法通常采用随机文本分割策略来生成对比对，而我们的方法利用生物医学文本中固有的上下文信息来生成语义上有意义的对比对。在最大的可用生物医学语料库上进行的大量实验验证了所提出方法的有效性。

{"title":"Context-Aware Contrastive Representation Learning for Zero-Shot Biomedical Text Classification.","authors":"Ratri Mukherjee, Kishlay Jha","doi":"10.1109/bibm62325.2024.10822585","DOIUrl":"10.1109/bibm62325.2024.10822585","url":null,"abstract":"Biomedical text classification refers to the task of annotating a biomedical text with its relevant labels from a candidate label set. Most of the existing approach operate in a fully supervised setting and thus heavily rely on human-annotated training data which is both labor-intensive and monetarily expensive. To address this, we propose to formulate biomedical text classification under the zero-shot learning (ZSL) paradigm that does not require any labeled training data and only relies on label surface names for training and inference. Specifically, we propose a new context-aware contrastive learning technique for ZSL that fully exploits the context information present in the biomedical text to generate semantically enriched feature representations needed for accurate zero-shot biomedical text classification. Unlike existing contrastive learning approaches that typically employ random text segmentation strategies to generate contrastive pairs, our approach utilizes the context information inherently present in biomedical text to generate semantically meaningful contrastive pairs. Extensive experiments on the largest available biomedical corpus validates the effectiveness of the proposed approach.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"3611-3614"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11916847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143659972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0