Regression Model for Better Generalization and Regression Analysis

Mohiuddeen Khan, Kanishk Srivastava
{"title":"Regression Model for Better Generalization and Regression Analysis","authors":"Mohiuddeen Khan, Kanishk Srivastava","doi":"10.1145/3380688.3380691","DOIUrl":null,"url":null,"abstract":"Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.","PeriodicalId":414793,"journal":{"name":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3380688.3380691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
更好的泛化和回归分析的回归模型
多项式回归等回归模型在训练实例上进行训练时,有时会因为多项式度值小而导致偏差大或欠拟合,导致不能很好地优化,对新的训练实例泛化效果差,也可能因为多项式拟合度高而导致方差大或过拟合。由于曲线曲率的不断变化,以及数据集曲线点的局部极值所产生的曲线的增减性质,使得假设曲线不能以较小的程度拟合所有的训练实例。曲线之间的局部极值由于多项式度小,使得假设曲线难以通过所有的训练实例进行拟合。通过将假设曲线分解为极值,即局部最大值和局部最小值,并为每个最大值-最小值或最小值-最大值区间部署单独的回归模型,可以实现更好的优化和泛化。由于没有任何局部极值,在区间之间曲线的曲率变化非常小,因此可以减少用于拟合模型的训练实例的数量。由于训练实例的减少,算法所花费的时间减少了,这使得模型的计算成本非常低。在UCI机器学习存储库数据集上进行测试时,该算法在联合循环电厂数据集[1]上的多项式回归准确率为53.47%,在联合循环电厂数据集[1]上的准确率为92.06%,在房地产估值数据集[2]上的多项式回归准确率为85.41%,在房地产估值数据集[2]上的准确率为96.33%。该方法可以为任何数学研究领域的改进提供非常有益的帮助,如偏差方差、成本最小化和统计曲线的更好拟合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Video-based Skeletal Feature Extraction for Hand Gesture Recognition An Effectual Sentiment Analysis for High Classification Rates Using Medical Image Processing Learning Question Similarity Diabetic Retinopathy Detection using Deep Learning A Study on the Effect of Fuzzy Membership Function on Fuzzified RIPPER for Stock Market Prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1