Sonal Upadhyay , Ravi Bhushan , Anima Tripathi , Lavina Chaubey , Amita Diwakar , Pawan K. Dubey
{"title":"Differential gene expression profile evaluation between uterine leiomyoma and leiomyosarcoma using a machine learning approach","authors":"Sonal Upadhyay , Ravi Bhushan , Anima Tripathi , Lavina Chaubey , Amita Diwakar , Pawan K. Dubey","doi":"10.1016/j.gocm.2023.08.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>The objective of this study is to differentiate between uterine leiomyomas (ULM) and uterine leiomyosarcomas (ULMS) by conducting molecular differential analysis and identifying potential prognostic biomarkers for diagnosis.</p></div><div><h3>Methods</h3><p>The microarray datasets (GSEID: GSE64763 and GSE185543) were retrieved from the Gene Expression Omnibus database. Data preprocessing and differential gene expressions (DEGs) analysis were performed. The DEGs were further intersected to find the common DEGs in ULM and ULMS and further validation of selected DEGs were performed. Further, a machine learning classifier was also applied in the selection of biomarkers. Protein-protein interaction network based upon STRING v 10.5, was constructed. Additionally, Gene Ontology (GO) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analyses were also performed to dissect possible functions and pathways.</p></div><div><h3>Results</h3><p>A total of 50 significant DEGs for ULM while 321 DEGs for ULMS have been identified with their official gene symbol. Between ULM and ULMS, a total of 14 common DEGs were identified of which 8 were up-regulated while 6 were down-regulated. The DEGs of (GSE185543) were also analyzed and the significant genes were retrieved common in both datasets for further analysis. Using a machine learning approach, 10 feature genes were identified. Using the expression profiles of these genes, a sequential minimal optimization (SMO) prediction model was built on the training set, and it accurately and reliably classified features expression in ULM and ULMS in the independent test set. Furthermore, Co- Enrichment analysis was also performed.</p></div><div><h3>Conclusion</h3><p>The study identified several DEGs, including ZNF365, EPYC, COL11A1, SHOX2, MMP13, TNN, GPM6A, and GATA2, through cross-validation, machine learning classifier, and Co- Enrichment analysis. These candidate disease genes may provide valuable insight into the underlying mechanisms and could be used as potential diagnostic biomarkers for ULM and ULMS. However, further validation of these genes is necessary to better understand their roles in the pathogenesis of ULM and ULMS.</p></div>","PeriodicalId":34826,"journal":{"name":"Gynecology and Obstetrics Clinical Medicine","volume":"3 3","pages":"Pages 154-162"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gynecology and Obstetrics Clinical Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667164623000672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
The objective of this study is to differentiate between uterine leiomyomas (ULM) and uterine leiomyosarcomas (ULMS) by conducting molecular differential analysis and identifying potential prognostic biomarkers for diagnosis.
Methods
The microarray datasets (GSEID: GSE64763 and GSE185543) were retrieved from the Gene Expression Omnibus database. Data preprocessing and differential gene expressions (DEGs) analysis were performed. The DEGs were further intersected to find the common DEGs in ULM and ULMS and further validation of selected DEGs were performed. Further, a machine learning classifier was also applied in the selection of biomarkers. Protein-protein interaction network based upon STRING v 10.5, was constructed. Additionally, Gene Ontology (GO) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analyses were also performed to dissect possible functions and pathways.
Results
A total of 50 significant DEGs for ULM while 321 DEGs for ULMS have been identified with their official gene symbol. Between ULM and ULMS, a total of 14 common DEGs were identified of which 8 were up-regulated while 6 were down-regulated. The DEGs of (GSE185543) were also analyzed and the significant genes were retrieved common in both datasets for further analysis. Using a machine learning approach, 10 feature genes were identified. Using the expression profiles of these genes, a sequential minimal optimization (SMO) prediction model was built on the training set, and it accurately and reliably classified features expression in ULM and ULMS in the independent test set. Furthermore, Co- Enrichment analysis was also performed.
Conclusion
The study identified several DEGs, including ZNF365, EPYC, COL11A1, SHOX2, MMP13, TNN, GPM6A, and GATA2, through cross-validation, machine learning classifier, and Co- Enrichment analysis. These candidate disease genes may provide valuable insight into the underlying mechanisms and could be used as potential diagnostic biomarkers for ULM and ULMS. However, further validation of these genes is necessary to better understand their roles in the pathogenesis of ULM and ULMS.