Regression Model for Better Generalization and Regression Analysis

Proceedings of the 4th International Conference on Machine Learning and Soft Computing Pub Date : 2020-01-17 DOI:10.1145/3380688.3380691

Mohiuddeen Khan, Kanishk Srivastava

{"title":"Regression Model for Better Generalization and Regression Analysis","authors":"Mohiuddeen Khan, Kanishk Srivastava","doi":"10.1145/3380688.3380691","DOIUrl":null,"url":null,"abstract":"Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.","PeriodicalId":414793,"journal":{"name":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3380688.3380691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

更好的泛化和回归分析的回归模型

多项式回归等回归模型在训练实例上进行训练时，有时会因为多项式度值小而导致偏差大或欠拟合，导致不能很好地优化，对新的训练实例泛化效果差，也可能因为多项式拟合度高而导致方差大或过拟合。由于曲线曲率的不断变化，以及数据集曲线点的局部极值所产生的曲线的增减性质，使得假设曲线不能以较小的程度拟合所有的训练实例。曲线之间的局部极值由于多项式度小，使得假设曲线难以通过所有的训练实例进行拟合。通过将假设曲线分解为极值，即局部最大值和局部最小值，并为每个最大值-最小值或最小值-最大值区间部署单独的回归模型，可以实现更好的优化和泛化。由于没有任何局部极值，在区间之间曲线的曲率变化非常小，因此可以减少用于拟合模型的训练实例的数量。由于训练实例的减少，算法所花费的时间减少了，这使得模型的计算成本非常低。在UCI机器学习存储库数据集上进行测试时，该算法在联合循环电厂数据集[1]上的多项式回归准确率为53.47%，在联合循环电厂数据集[1]上的准确率为92.06%，在房地产估值数据集[2]上的多项式回归准确率为85.41%，在房地产估值数据集[2]上的准确率为96.33%。该方法可以为任何数学研究领域的改进提供非常有益的帮助，如偏差方差、成本最小化和统计曲线的更好拟合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 4th International Conference on Machine Learning and Soft Computing

自引率

0.00%

发文量

期刊最新文献

Video-based Skeletal Feature Extraction for Hand Gesture Recognition An Effectual Sentiment Analysis for High Classification Rates Using Medical Image Processing Learning Question Similarity Diabetic Retinopathy Detection using Deep Learning A Study on the Effect of Fuzzy Membership Function on Fuzzified RIPPER for Stock Market Prediction