{"title":"A Gaussian Model for Feature Selection in Protein Fold Recognition","authors":"P. Shiguihara-Juárez, Nils Murrugarra-Llerena","doi":"10.1109/SHIRCON.2018.8593155","DOIUrl":null,"url":null,"abstract":"Protein fold recognition is an important task to discover new biological functions of proteins. In this context, machine learning techniques have been used to protein fold recognition, stating this task as a classification problem. However, in many cases, the similarity of patterns to protein fold recognition becomes this process in a complex task, limiting the performance of the machine learning techniques. In this paper, we propose a feature selection method to support machine learning methods for protein fold recognition, using gaussian distributions in the process of features analysis. We cluster features by gaussian distributions. These clusters give information to reduce the dimensionality of the features. After that, we use baselines classifiers to protein fold recognition, using a well-known dataset for this task. The results suggest that the clustering and reduction of dimensionality of features using gaussian distribution can help to improve the accuracy of machine learning techniques on this task.","PeriodicalId":408525,"journal":{"name":"2018 IEEE Sciences and Humanities International Research Conference (SHIRCON)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Sciences and Humanities International Research Conference (SHIRCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SHIRCON.2018.8593155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Protein fold recognition is an important task to discover new biological functions of proteins. In this context, machine learning techniques have been used to protein fold recognition, stating this task as a classification problem. However, in many cases, the similarity of patterns to protein fold recognition becomes this process in a complex task, limiting the performance of the machine learning techniques. In this paper, we propose a feature selection method to support machine learning methods for protein fold recognition, using gaussian distributions in the process of features analysis. We cluster features by gaussian distributions. These clusters give information to reduce the dimensionality of the features. After that, we use baselines classifiers to protein fold recognition, using a well-known dataset for this task. The results suggest that the clustering and reduction of dimensionality of features using gaussian distribution can help to improve the accuracy of machine learning techniques on this task.