{"title":"Feature Selection and Modeling using Statistical and Machine learning Methods","authors":"Sofia D'souza, P. V., Balaji S","doi":"10.1109/DISCOVER50404.2020.9278093","DOIUrl":null,"url":null,"abstract":"Feature selection is a necessary step in machine learning regression problems that aims to find relevant and reduced set of features. In this research, we assessed the performance of three different learning models on a Quantitative structure activity relationship (QSAR) dataset. Learning models were developed from a pool of features selected by three different variable selection techniques. The results indicate that the final learning models built using statistically significant features exhibit improved predictive performance. Further, Partial least squares (PLS) learning model has shown better predictive performance compared to other learning models on the external test set.","PeriodicalId":131517,"journal":{"name":"2020 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DISCOVER50404.2020.9278093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Feature selection is a necessary step in machine learning regression problems that aims to find relevant and reduced set of features. In this research, we assessed the performance of three different learning models on a Quantitative structure activity relationship (QSAR) dataset. Learning models were developed from a pool of features selected by three different variable selection techniques. The results indicate that the final learning models built using statistically significant features exhibit improved predictive performance. Further, Partial least squares (PLS) learning model has shown better predictive performance compared to other learning models on the external test set.