Seema Sandeep Redekar , Satishkumar L. Varma , Atanu Bhattacharjee
{"title":"利用TCGA数据集的综合分析鉴定与多形性胶质母细胞瘤存活相关的关键基因","authors":"Seema Sandeep Redekar , Satishkumar L. Varma , Atanu Bhattacharjee","doi":"10.1016/j.cmpbup.2022.100051","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><p>Glioblastoma (GBM) is the most aggressive type of brain tumor. In spite of having various treatment options, GBM patients usually have a poor prognosis. Genetic markers play a vital role in the progression of the disease. Identification of these novel molecular biomarkers is essential to explain the mechanisms or improve the prognosis of GBM. Advances in high throughput genomic technologies enable the analysis of the varied types of omics data to find biomarkers in GBM. Although data repositories like The Cancer Genome Atlas (TCGA) are rich sources of such multi-omics data, integrating these different genomic datasets of varying quality and patient heterogeneity is challenging.</p></div><div><h3>Methods</h3><p>Multi-omics gene expression datasets from TCGA consisting of DNA methylation, RNA sequencing, and copy number variation (CNV) of GBM patient is obtained to carry out the analysis. The Cox proportional hazards regression model is developed in R to identify significant genes from diverse datasets associated with the patient's survival. (Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) is used as an estimator for the model. Validation is performed to determine the accuracy and corresponding prediction error.</p></div><div><h3>Results</h3><p>Five key genes are identified from DNA Methylation and RNA sequencing datasets those are ANK1, HOXA9, TOX2, CXCR6, PIGZ, and L3MBTL, KDM5B, CCDC138, NUS1P1, and ARHGAP42, respectively. Higher expression values of these genes determine better survival of the GBM patients. Kaplan-Meier estimate curves show the exact correlation. Lower values of AIC and BIC determine the suitability of the model. The prediction model is validated on the test set and signifies a low error rate. Copy number variation data is also analysed to find the significant chromosomal location of GBM patients associated with chromosome 2,5,6,7,12,13, respectively. Among all nine CNV locations are found to be influencing the progression of GBM.</p></div><div><h3>Conclusion</h3><p>Integrated analysis of multiple omics dataset is carried out to identify significant genes from DNA Methylation and RNA sequencing profiles of 76 common individuals. Copy number variation dataset for the same patients is analyzed to recognize notable locations associated with 22 chromosomes. The survival analysis determines the correlation of these biomarkers with the progression of the disease.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"2 ","pages":"Article 100051"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666990022000039/pdfft?md5=e56e6a85d26c6ce9044564a4722badb2&pid=1-s2.0-S2666990022000039-main.pdf","citationCount":"6","resultStr":"{\"title\":\"Identification of key genes associated with survival of glioblastoma multiforme using integrated analysis of TCGA datasets\",\"authors\":\"Seema Sandeep Redekar , Satishkumar L. Varma , Atanu Bhattacharjee\",\"doi\":\"10.1016/j.cmpbup.2022.100051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objective</h3><p>Glioblastoma (GBM) is the most aggressive type of brain tumor. In spite of having various treatment options, GBM patients usually have a poor prognosis. Genetic markers play a vital role in the progression of the disease. Identification of these novel molecular biomarkers is essential to explain the mechanisms or improve the prognosis of GBM. Advances in high throughput genomic technologies enable the analysis of the varied types of omics data to find biomarkers in GBM. Although data repositories like The Cancer Genome Atlas (TCGA) are rich sources of such multi-omics data, integrating these different genomic datasets of varying quality and patient heterogeneity is challenging.</p></div><div><h3>Methods</h3><p>Multi-omics gene expression datasets from TCGA consisting of DNA methylation, RNA sequencing, and copy number variation (CNV) of GBM patient is obtained to carry out the analysis. The Cox proportional hazards regression model is developed in R to identify significant genes from diverse datasets associated with the patient's survival. (Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) is used as an estimator for the model. Validation is performed to determine the accuracy and corresponding prediction error.</p></div><div><h3>Results</h3><p>Five key genes are identified from DNA Methylation and RNA sequencing datasets those are ANK1, HOXA9, TOX2, CXCR6, PIGZ, and L3MBTL, KDM5B, CCDC138, NUS1P1, and ARHGAP42, respectively. Higher expression values of these genes determine better survival of the GBM patients. Kaplan-Meier estimate curves show the exact correlation. Lower values of AIC and BIC determine the suitability of the model. The prediction model is validated on the test set and signifies a low error rate. Copy number variation data is also analysed to find the significant chromosomal location of GBM patients associated with chromosome 2,5,6,7,12,13, respectively. Among all nine CNV locations are found to be influencing the progression of GBM.</p></div><div><h3>Conclusion</h3><p>Integrated analysis of multiple omics dataset is carried out to identify significant genes from DNA Methylation and RNA sequencing profiles of 76 common individuals. Copy number variation dataset for the same patients is analyzed to recognize notable locations associated with 22 chromosomes. The survival analysis determines the correlation of these biomarkers with the progression of the disease.</p></div>\",\"PeriodicalId\":72670,\"journal\":{\"name\":\"Computer methods and programs in biomedicine update\",\"volume\":\"2 \",\"pages\":\"Article 100051\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666990022000039/pdfft?md5=e56e6a85d26c6ce9044564a4722badb2&pid=1-s2.0-S2666990022000039-main.pdf\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine update\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666990022000039\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990022000039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identification of key genes associated with survival of glioblastoma multiforme using integrated analysis of TCGA datasets
Background and Objective
Glioblastoma (GBM) is the most aggressive type of brain tumor. In spite of having various treatment options, GBM patients usually have a poor prognosis. Genetic markers play a vital role in the progression of the disease. Identification of these novel molecular biomarkers is essential to explain the mechanisms or improve the prognosis of GBM. Advances in high throughput genomic technologies enable the analysis of the varied types of omics data to find biomarkers in GBM. Although data repositories like The Cancer Genome Atlas (TCGA) are rich sources of such multi-omics data, integrating these different genomic datasets of varying quality and patient heterogeneity is challenging.
Methods
Multi-omics gene expression datasets from TCGA consisting of DNA methylation, RNA sequencing, and copy number variation (CNV) of GBM patient is obtained to carry out the analysis. The Cox proportional hazards regression model is developed in R to identify significant genes from diverse datasets associated with the patient's survival. (Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) is used as an estimator for the model. Validation is performed to determine the accuracy and corresponding prediction error.
Results
Five key genes are identified from DNA Methylation and RNA sequencing datasets those are ANK1, HOXA9, TOX2, CXCR6, PIGZ, and L3MBTL, KDM5B, CCDC138, NUS1P1, and ARHGAP42, respectively. Higher expression values of these genes determine better survival of the GBM patients. Kaplan-Meier estimate curves show the exact correlation. Lower values of AIC and BIC determine the suitability of the model. The prediction model is validated on the test set and signifies a low error rate. Copy number variation data is also analysed to find the significant chromosomal location of GBM patients associated with chromosome 2,5,6,7,12,13, respectively. Among all nine CNV locations are found to be influencing the progression of GBM.
Conclusion
Integrated analysis of multiple omics dataset is carried out to identify significant genes from DNA Methylation and RNA sequencing profiles of 76 common individuals. Copy number variation dataset for the same patients is analyzed to recognize notable locations associated with 22 chromosomes. The survival analysis determines the correlation of these biomarkers with the progression of the disease.