Luis Diego Mora-Jimenez, Oscar Azofeifa-Segura, J. Guevara-Coto
{"title":"使用机器学习算法鉴定的新型癌症相关lncrna的功能注释","authors":"Luis Diego Mora-Jimenez, Oscar Azofeifa-Segura, J. Guevara-Coto","doi":"10.1109/CSCI49370.2019.00274","DOIUrl":null,"url":null,"abstract":"Cancer consists of a set of diseases that result from deregulated cell growth and invasion of adjacent tissues. Due to an increase in research, more information has become available regarding the potential causes for cancer, including non-coding elements such as lncRNAs. This new knowledge can be discovered through machine learning methods that can extract new information from data such as gene expression profiles and identify new cancer-associated genes. For this work we use two different machine learning algorithms, random forests and support vector machines. The models were trained and we tested fine-tuning methods including: balancing and feature selection. The predictors with the highest metrics were: balanced RF with Boruta (AUC-ROC: 0.9696) and the balanced SVM with recursive feature elimination (AUC-ROC: 0.9710). These models were used to identify new potential lncRNA driver-like genes from protein coding expression data. The predicted candidates were then functionally annotated using disease ontologies and molecular function ontologies to determine their enrichment in cancer related processes. These processes included prostate cancer and glycosaminglycan binding, a potential tumor therapeutic target.","PeriodicalId":103662,"journal":{"name":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Functional Annotations of Novel Cancer-Associated lncRNAs Identified Using Machine Learning Algorithms\",\"authors\":\"Luis Diego Mora-Jimenez, Oscar Azofeifa-Segura, J. Guevara-Coto\",\"doi\":\"10.1109/CSCI49370.2019.00274\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer consists of a set of diseases that result from deregulated cell growth and invasion of adjacent tissues. Due to an increase in research, more information has become available regarding the potential causes for cancer, including non-coding elements such as lncRNAs. This new knowledge can be discovered through machine learning methods that can extract new information from data such as gene expression profiles and identify new cancer-associated genes. For this work we use two different machine learning algorithms, random forests and support vector machines. The models were trained and we tested fine-tuning methods including: balancing and feature selection. The predictors with the highest metrics were: balanced RF with Boruta (AUC-ROC: 0.9696) and the balanced SVM with recursive feature elimination (AUC-ROC: 0.9710). These models were used to identify new potential lncRNA driver-like genes from protein coding expression data. The predicted candidates were then functionally annotated using disease ontologies and molecular function ontologies to determine their enrichment in cancer related processes. These processes included prostate cancer and glycosaminglycan binding, a potential tumor therapeutic target.\",\"PeriodicalId\":103662,\"journal\":{\"name\":\"2019 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSCI49370.2019.00274\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI49370.2019.00274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Functional Annotations of Novel Cancer-Associated lncRNAs Identified Using Machine Learning Algorithms
Cancer consists of a set of diseases that result from deregulated cell growth and invasion of adjacent tissues. Due to an increase in research, more information has become available regarding the potential causes for cancer, including non-coding elements such as lncRNAs. This new knowledge can be discovered through machine learning methods that can extract new information from data such as gene expression profiles and identify new cancer-associated genes. For this work we use two different machine learning algorithms, random forests and support vector machines. The models were trained and we tested fine-tuning methods including: balancing and feature selection. The predictors with the highest metrics were: balanced RF with Boruta (AUC-ROC: 0.9696) and the balanced SVM with recursive feature elimination (AUC-ROC: 0.9710). These models were used to identify new potential lncRNA driver-like genes from protein coding expression data. The predicted candidates were then functionally annotated using disease ontologies and molecular function ontologies to determine their enrichment in cancer related processes. These processes included prostate cancer and glycosaminglycan binding, a potential tumor therapeutic target.