Yulin Tao , Minqi Xiong , Yirui Peng , Lili Yao , Haibo Zhu , Qiong Zhou , Jun Ouyang
{"title":"Machine learning-based identification and validation of immune-related biomarkers for early diagnosis and targeted therapy in diabetic retinopathy","authors":"Yulin Tao , Minqi Xiong , Yirui Peng , Lili Yao , Haibo Zhu , Qiong Zhou , Jun Ouyang","doi":"10.1016/j.gene.2024.149015","DOIUrl":null,"url":null,"abstract":"<div><div>The early diagnosis of diabetic retinopathy (DR) is challenging, highlighting the urgent need to identify new biomarkers. Immune responses play a crucial role in DR, yet there are currently no reports of machine learning (ML) algorithms being utilized for the development of immune-related molecular markers in DR. Based on the datasets GSE102485 and GSE160306, differentially expressed genes (DEGs) were screened using Weighted Gene Co-expression Network Analysis (WGCNA). Five ML algorithms including Bayesian, Learning Vector Quantization (LVQ), Wrapper (Boruta), Random Forest (RF), and Logistic Regression were employed to select immune-related genes associated with DR (DR.Sig). Seven ML algorithms including Naive Bayes (NB), RF, Support Vector Machine (SVM), AdaBoost Classification Trees (AdaBoost), Boosted Logistic Regressions (LogitBoost), K-Nearest Neighbors (KNN), and Cancerclass were utilized to construct a predictive model for DR. The relationship between DR.Sig genes and immune cells was analyzed using single-sample Gene Set Enrichment Analysis (ssGSEA). Additionally, drug sensitivity prediction of DR.Sig genes and molecular docking were performed. Through the utilization of 5 ML algorithms, 6 immune-related biomarkers closely related to the occurrence of DR were identified, including FCGR2B, CSRP1, EDNRA, SDC2, TEK, and CIITA. The DR predictive model constructed based on these 6 DR.Sig genes using the Cancerclass algorithm demonstrated superior predictive performance compared to 4 previously published DR-related biomarkers. In vivo and in vitro experiments also provided strong validation of the expression of the 6 genes in DR. Positive correlations were observed between these genes and 22 types of immune cells. Molecular docking results revealed that CSRP1, EDNRA, and TEK exhibited the highest affinities with the small molecule compounds etoposide, FR-139317, and camptothecin, respectively. The models constructed based on various ML algorithms can effectively predict the occurrence of DR events and hold potential for targeted drug therapies, providing a basis for the early diagnosis and targeted treatment of DR.</div></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378111924008965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
The early diagnosis of diabetic retinopathy (DR) is challenging, highlighting the urgent need to identify new biomarkers. Immune responses play a crucial role in DR, yet there are currently no reports of machine learning (ML) algorithms being utilized for the development of immune-related molecular markers in DR. Based on the datasets GSE102485 and GSE160306, differentially expressed genes (DEGs) were screened using Weighted Gene Co-expression Network Analysis (WGCNA). Five ML algorithms including Bayesian, Learning Vector Quantization (LVQ), Wrapper (Boruta), Random Forest (RF), and Logistic Regression were employed to select immune-related genes associated with DR (DR.Sig). Seven ML algorithms including Naive Bayes (NB), RF, Support Vector Machine (SVM), AdaBoost Classification Trees (AdaBoost), Boosted Logistic Regressions (LogitBoost), K-Nearest Neighbors (KNN), and Cancerclass were utilized to construct a predictive model for DR. The relationship between DR.Sig genes and immune cells was analyzed using single-sample Gene Set Enrichment Analysis (ssGSEA). Additionally, drug sensitivity prediction of DR.Sig genes and molecular docking were performed. Through the utilization of 5 ML algorithms, 6 immune-related biomarkers closely related to the occurrence of DR were identified, including FCGR2B, CSRP1, EDNRA, SDC2, TEK, and CIITA. The DR predictive model constructed based on these 6 DR.Sig genes using the Cancerclass algorithm demonstrated superior predictive performance compared to 4 previously published DR-related biomarkers. In vivo and in vitro experiments also provided strong validation of the expression of the 6 genes in DR. Positive correlations were observed between these genes and 22 types of immune cells. Molecular docking results revealed that CSRP1, EDNRA, and TEK exhibited the highest affinities with the small molecule compounds etoposide, FR-139317, and camptothecin, respectively. The models constructed based on various ML algorithms can effectively predict the occurrence of DR events and hold potential for targeted drug therapies, providing a basis for the early diagnosis and targeted treatment of DR.