{"title":"Regression Model for Better Generalization and Regression Analysis","authors":"Mohiuddeen Khan, Kanishk Srivastava","doi":"10.1145/3380688.3380691","DOIUrl":null,"url":null,"abstract":"Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.","PeriodicalId":414793,"journal":{"name":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3380688.3380691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.