J. Pérez-Rave, J. C. Correa-Morales, Favián González-Echavarría
{"title":"A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes","authors":"J. Pérez-Rave, J. C. Correa-Morales, Favián González-Echavarría","doi":"10.1080/09599916.2019.1587489","DOIUrl":null,"url":null,"abstract":"ABSTRACT The hedonic price regressions have mainly been used for inference. In contrast, machine learning employed on big data has a great potential for prediction. To contribute to the integration of these two strategies, this article proposes a machine learning approach to the regression analysis of big data, viz. real estate prices, for both inferential and predictive purposes. The methodology incorporates a new procedure of selecting variables, called ‘incremental sample with resampling’ (MINREM). The methodology is tested on two cases. The first is data from web advertisements selling used homes in Colombia (61,826 observations). The second considers the data (58,888 observations) from a sample of the Metropolitan American Housing Survey 2011 obtained and prepared by a reference study. The methodology consists of two stages. The first chooses the important variables under MINREM; the second focuses on the traditional training and validation procedure for machine learning, adding three activities. In both test cases, the methodology shows its value for obtaining highly parsimonious and stable models for different sample sizes, as well as taking advantage of the inferential and predictive use of the obtained regression functions. This paper contributes to an original methodology for big data regression analysis.","PeriodicalId":45726,"journal":{"name":"Journal of Property Research","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09599916.2019.1587489","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Property Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09599916.2019.1587489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"URBAN STUDIES","Score":null,"Total":0}
引用次数: 55
Abstract
ABSTRACT The hedonic price regressions have mainly been used for inference. In contrast, machine learning employed on big data has a great potential for prediction. To contribute to the integration of these two strategies, this article proposes a machine learning approach to the regression analysis of big data, viz. real estate prices, for both inferential and predictive purposes. The methodology incorporates a new procedure of selecting variables, called ‘incremental sample with resampling’ (MINREM). The methodology is tested on two cases. The first is data from web advertisements selling used homes in Colombia (61,826 observations). The second considers the data (58,888 observations) from a sample of the Metropolitan American Housing Survey 2011 obtained and prepared by a reference study. The methodology consists of two stages. The first chooses the important variables under MINREM; the second focuses on the traditional training and validation procedure for machine learning, adding three activities. In both test cases, the methodology shows its value for obtaining highly parsimonious and stable models for different sample sizes, as well as taking advantage of the inferential and predictive use of the obtained regression functions. This paper contributes to an original methodology for big data regression analysis.
期刊介绍:
The Journal of Property Research is an international journal. The title reflects the expansion of research, particularly applied research, into property investment and development. The Journal of Property Research publishes papers in any area of real estate investment and development. These may be theoretical, empirical, case studies or critical literature surveys.