Mustafa Erhan Özer, Pemra Özbek Sarica, Kazım Yalçın Arğa
{"title":"SVM-DO:通过疾病本体支持的支持向量机识别肿瘤鉴别mRNA特征。","authors":"Mustafa Erhan Özer, Pemra Özbek Sarica, Kazım Yalçın Arğa","doi":"10.55730/1300-0152.2670","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aim: </strong>The complicated nature of tumor formation makes it difficult to identify discriminatory genes. Recently, transcriptome-based supervised classification methods using support vector machines (SVMs) have become popular in this field. However, the inclusion of less significant variables in the construction of classification models can lead to misclassification. To improve model performance, feature selection methods such as enrichment analysis can be used to extract useful variable sets. The detection of genes that can discriminate between normal and tumor samples in the association of cancer and disease remains an area of limited information. We therefore aimed to discover novel and practical sets of discriminatory biomarkers by utilizing the association of cancer and disease.</p><p><strong>Materials and methods: </strong>In this study, we employed an SVM classification method for differentially expressed genes enriched by Disease Ontology and filtered nondiscriminatory features using Wilk's lambda criterion prior to classification. Our approach uses the discovery of disease-associated genes as a viable strategy to identify gene sets that discriminate between tumor and normal states. We analyzed the performance of our algorithm using comprehensive RNA-Seq data for adenocarcinoma of the colon, squamous cell carcinoma of the lung, and adenocarcinoma of the lung. The classification performance of the obtained gene sets was analyzed by comparison with different expression datasets and previous studies using the same datasets.</p><p><strong>Results: </strong>It was found that our algorithm extracts stable small gene sets that provide high accuracy in predicting cancer status. In addition, the gene sets generated by our method perform well in survival analyses, indicating their potential for prognosis.</p><p><strong>Conclusion: </strong>By combining gene sets for both diagnosis and prognosis, our method can improve clinical applications in cancer research. Our algorithm is available as an R package with a graphical user interface in Bioconductor (https://doi.org/10.18129/B9.bioc.SVMDO) and GitHub (https://github.com/robogeno/SVMDO).</p>","PeriodicalId":94363,"journal":{"name":"Turkish journal of biology = Turk biyoloji dergisi","volume":"47 6","pages":"349-365"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11045210/pdf/","citationCount":"0","resultStr":"{\"title\":\"SVM-DO: identification of tumor-discriminating mRNA signatures via support vector machines supported by Disease Ontology.\",\"authors\":\"Mustafa Erhan Özer, Pemra Özbek Sarica, Kazım Yalçın Arğa\",\"doi\":\"10.55730/1300-0152.2670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background/aim: </strong>The complicated nature of tumor formation makes it difficult to identify discriminatory genes. Recently, transcriptome-based supervised classification methods using support vector machines (SVMs) have become popular in this field. However, the inclusion of less significant variables in the construction of classification models can lead to misclassification. To improve model performance, feature selection methods such as enrichment analysis can be used to extract useful variable sets. The detection of genes that can discriminate between normal and tumor samples in the association of cancer and disease remains an area of limited information. We therefore aimed to discover novel and practical sets of discriminatory biomarkers by utilizing the association of cancer and disease.</p><p><strong>Materials and methods: </strong>In this study, we employed an SVM classification method for differentially expressed genes enriched by Disease Ontology and filtered nondiscriminatory features using Wilk's lambda criterion prior to classification. Our approach uses the discovery of disease-associated genes as a viable strategy to identify gene sets that discriminate between tumor and normal states. We analyzed the performance of our algorithm using comprehensive RNA-Seq data for adenocarcinoma of the colon, squamous cell carcinoma of the lung, and adenocarcinoma of the lung. The classification performance of the obtained gene sets was analyzed by comparison with different expression datasets and previous studies using the same datasets.</p><p><strong>Results: </strong>It was found that our algorithm extracts stable small gene sets that provide high accuracy in predicting cancer status. In addition, the gene sets generated by our method perform well in survival analyses, indicating their potential for prognosis.</p><p><strong>Conclusion: </strong>By combining gene sets for both diagnosis and prognosis, our method can improve clinical applications in cancer research. Our algorithm is available as an R package with a graphical user interface in Bioconductor (https://doi.org/10.18129/B9.bioc.SVMDO) and GitHub (https://github.com/robogeno/SVMDO).</p>\",\"PeriodicalId\":94363,\"journal\":{\"name\":\"Turkish journal of biology = Turk biyoloji dergisi\",\"volume\":\"47 6\",\"pages\":\"349-365\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11045210/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Turkish journal of biology = Turk biyoloji dergisi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.55730/1300-0152.2670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish journal of biology = Turk biyoloji dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55730/1300-0152.2670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
SVM-DO: identification of tumor-discriminating mRNA signatures via support vector machines supported by Disease Ontology.
Background/aim: The complicated nature of tumor formation makes it difficult to identify discriminatory genes. Recently, transcriptome-based supervised classification methods using support vector machines (SVMs) have become popular in this field. However, the inclusion of less significant variables in the construction of classification models can lead to misclassification. To improve model performance, feature selection methods such as enrichment analysis can be used to extract useful variable sets. The detection of genes that can discriminate between normal and tumor samples in the association of cancer and disease remains an area of limited information. We therefore aimed to discover novel and practical sets of discriminatory biomarkers by utilizing the association of cancer and disease.
Materials and methods: In this study, we employed an SVM classification method for differentially expressed genes enriched by Disease Ontology and filtered nondiscriminatory features using Wilk's lambda criterion prior to classification. Our approach uses the discovery of disease-associated genes as a viable strategy to identify gene sets that discriminate between tumor and normal states. We analyzed the performance of our algorithm using comprehensive RNA-Seq data for adenocarcinoma of the colon, squamous cell carcinoma of the lung, and adenocarcinoma of the lung. The classification performance of the obtained gene sets was analyzed by comparison with different expression datasets and previous studies using the same datasets.
Results: It was found that our algorithm extracts stable small gene sets that provide high accuracy in predicting cancer status. In addition, the gene sets generated by our method perform well in survival analyses, indicating their potential for prognosis.
Conclusion: By combining gene sets for both diagnosis and prognosis, our method can improve clinical applications in cancer research. Our algorithm is available as an R package with a graphical user interface in Bioconductor (https://doi.org/10.18129/B9.bioc.SVMDO) and GitHub (https://github.com/robogeno/SVMDO).