{"title":"Big Data Classification and Machine Learning Using Zillow Estimates","authors":"Si-Hao Du, Yi. Gu, Yuewei Zhu","doi":"10.1109/CONF-SPML54095.2021.00056","DOIUrl":null,"url":null,"abstract":"Zillow’s is a real estate company that relies on the estimated costs of a house to set its price. The log error of prediction is calculated by the log difference between the prediction and the actual sale price. Thusly, the goal of this work is trying to minimize this error in order to improve accuracy. Due to the fact that real estate dataset has multiple feature blanks, preprocessing methods of the data show large significance in this work. On the other hand, particularly important features are selected, and several machine learning models— Decision Tree, Random Forest, Linear Regression— are applied to predict. In conclusion, Linear Regression performs better than the other two models. Some future work, like feature engineering methods, can be done to further improve the accuracy.","PeriodicalId":415094,"journal":{"name":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONF-SPML54095.2021.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Zillow’s is a real estate company that relies on the estimated costs of a house to set its price. The log error of prediction is calculated by the log difference between the prediction and the actual sale price. Thusly, the goal of this work is trying to minimize this error in order to improve accuracy. Due to the fact that real estate dataset has multiple feature blanks, preprocessing methods of the data show large significance in this work. On the other hand, particularly important features are selected, and several machine learning models— Decision Tree, Random Forest, Linear Regression— are applied to predict. In conclusion, Linear Regression performs better than the other two models. Some future work, like feature engineering methods, can be done to further improve the accuracy.