{"title":"A diagnostic model for sepsis using an integrated machine learning framework approach and its therapeutic drug discovery.","authors":"Wuping Zhang, Hanping Shi, Jie Peng","doi":"10.1186/s12879-025-10616-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Sepsis remains a life-threatening condition in intensive care units (ICU) with high morbidity and mortality rates. Some biomarkers commonly used in clinic do not have the characteristics of rapid and specific growth and rapid decline after effective treatment. Machine learning has shown great potential in early diagnosis, subtype analysis, accurate treatment and prognosis evaluation of sepsis.</p><p><strong>Methods: </strong>Gene expression matrices from GSE13904 and GSE26440 were combined into a training model after quality control and standardization. Then, the intersection genes were obtained by crossing the screened differentially expressed genes (DEGs) and the module genes with the strongest correlation obtained by WGCNA analysis. 113 combined machine learning algorithms to build a diagnosis model. Then the CIBERSORT algorithm is used to analyze the relationship between the change of core gene expression and immune response in sepsis. Construct nomogram, DCA and CIC to further verify the reliability of the diagnosis model. The potential molecular compounds interacting with key genes were searched from the Traditional Chinese Medicine Active Compound Library (TCMACL).</p><p><strong>Results: </strong>We screened 405 DEGs, including 334 up-regulated and 71 down-regulated genes. The 308 potential genes were obtained by intersection of MEturquoise module genes in WGCNA analysis and DEGs for subsequent machine learning analysis. GO and KEGG enrichment analysis showed that sepsis was mainly related to immune response and bacterial infection. Then 113 combined machine learning algorithms are applied to construct a diagnosis model to screen 22 hub genes. Four four key genes (CD177, GNLY, ANKRD22, and IFIT1) are obtained through further analysis of PPI network constructed by 22 hub genes. Subsequently, the diagnostic model is proved to have good predictive value by nomogram, DCA and CIC. Finally, molecular compounds (Dieckol, Grosvenorine and Tellimagrandin II) were screened out as potential drugs.</p><p><strong>Conclusion: </strong>113 combinated machine learning algorithms screened out four key genes that can distinguish sepsis patients. At the same time, potential therapeutic molecular compounds interacting with key genes genes were screened out by molecular docking.</p>","PeriodicalId":8981,"journal":{"name":"BMC Infectious Diseases","volume":"25 1","pages":"219"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Infectious Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12879-025-10616-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Sepsis remains a life-threatening condition in intensive care units (ICU) with high morbidity and mortality rates. Some biomarkers commonly used in clinic do not have the characteristics of rapid and specific growth and rapid decline after effective treatment. Machine learning has shown great potential in early diagnosis, subtype analysis, accurate treatment and prognosis evaluation of sepsis.
Methods: Gene expression matrices from GSE13904 and GSE26440 were combined into a training model after quality control and standardization. Then, the intersection genes were obtained by crossing the screened differentially expressed genes (DEGs) and the module genes with the strongest correlation obtained by WGCNA analysis. 113 combined machine learning algorithms to build a diagnosis model. Then the CIBERSORT algorithm is used to analyze the relationship between the change of core gene expression and immune response in sepsis. Construct nomogram, DCA and CIC to further verify the reliability of the diagnosis model. The potential molecular compounds interacting with key genes were searched from the Traditional Chinese Medicine Active Compound Library (TCMACL).
Results: We screened 405 DEGs, including 334 up-regulated and 71 down-regulated genes. The 308 potential genes were obtained by intersection of MEturquoise module genes in WGCNA analysis and DEGs for subsequent machine learning analysis. GO and KEGG enrichment analysis showed that sepsis was mainly related to immune response and bacterial infection. Then 113 combined machine learning algorithms are applied to construct a diagnosis model to screen 22 hub genes. Four four key genes (CD177, GNLY, ANKRD22, and IFIT1) are obtained through further analysis of PPI network constructed by 22 hub genes. Subsequently, the diagnostic model is proved to have good predictive value by nomogram, DCA and CIC. Finally, molecular compounds (Dieckol, Grosvenorine and Tellimagrandin II) were screened out as potential drugs.
Conclusion: 113 combinated machine learning algorithms screened out four key genes that can distinguish sepsis patients. At the same time, potential therapeutic molecular compounds interacting with key genes genes were screened out by molecular docking.
期刊介绍:
BMC Infectious Diseases is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of infectious and sexually transmitted diseases in humans, as well as related molecular genetics, pathophysiology, and epidemiology.