Changing the Location Game – Improving Location Analytics with the Help of Explainable AI

IF 1.2 4区经济学 Q3 BUSINESS, FINANCE Journal of Real Estate Research Pub Date : 2023-09-29 DOI:10.1080/08965803.2023.2258012

Moritz Stang, Bastian Krämer, Marcelo Cajias, Wolfgang Schäfers

{"title":"Changing the Location Game – Improving Location Analytics with the Help of Explainable AI","authors":"Moritz Stang, Bastian Krämer, Marcelo Cajias, Wolfgang Schäfers","doi":"10.1080/08965803.2023.2258012","DOIUrl":null,"url":null,"abstract":"AbstractBesides its structural and economic characteristics, the location of a property is probably one of the most important determinants of its underlying value. In contrast to property valuations, there are hardly any approaches to date that evaluate the quality of a real estate location in an automated manner. The reasons are the complexity, the number of interactions and the non-linearities underlying the quality specifications of a certain location. By combining a state-of-the-art machine learning algorithm and the local post-hoc model agnostic method of Shapley Additive Explanations, this paper introduces a newly developed approach – called SHAP location score – that is able to detect these complexities and enables assessing real estate locations in a data-based manner. The SHAP location score represents an intuitive and flexible approach based on econometric modeling techniques and the basic assumptions of hedonic pricing theory. The approach can be applied post-hoc to any common machine learning method and can be flexibly adapted to the respective needs. This constitutes a significant extension of traditional urban models and offers many advantages for a wide range of real estate players.Keywords: Location AnalyticsExplainable AIMachine LearningShapley ValuesAutomated LocationValuation Model Disclosure StatementNo potential conflict of interest was reported by the author(s).Notes1 This term describes the fact that this technique is applied after the actual training of an algorithm (= post-hoc) and can be applied for different algorithms (= model-agnostic).2 In the context of the SHAP-LS methodology, it is in principle possible to use both purchase or rental prices. Both reflect the observable willingness to pay for a property with certain characteristics and a certain location and can thus be used in this logic in an arbitrary manner.3 An example of the identification of aggregated results would be the Permutation Feature Importance (see e.g., Krämer, Nagl, et al., Citation2023).4 Theoretically, the SHAP-LS of single features could be used for this kind of analysis. However, this is not recommended, as one is exposed to the capriciousness of the algorithms and data providers. Often, locational features such as the distance to the next bus stop and to the next subway station correlate highly. Consequently, the algorithm cannot distinguish perfectly between these correlated features, which can lead to a blurring of the individual SHAP values. Another reason that should not be neglected is the dependence on the categorization of the location characteristics of the data providers. In some cases, individual amenities overlap considerably, e.g., the classification of restaurants, pubs or bars. Combining several individual characteristics into categories can counteract this blurring. As a rule of thumb, it can be stated that the more data available, the smaller the categories that can be used.5 It should be noted that the overall and categorical SHAP-LSs can also be utilized without scaling, providing absolute values instead of relative comparisons. This alternative application of the SHAP-LS methodology, the absolute SHAP-LS, leads to the derivation of absolute contributions of location-specific features, expressed in the unit of the property's price. This interpretation presents a quantifiable measure of the value placed on the location characteristics compared to the average prediction by market participants, thus providing an additional perspective for analysis. It also broadens the scope for data interpretation, thereby offering new ways for investigation within the realms of real estate valuation and location quality assessment. An exemplary implementation of this can be found in Appendix VI.6 A further application in other asset classes, such as retail, industrial, hotel, or office, is also theoretically possible but beyond the scope of this study.7 Since the SHAP-LS methodology is used to analyze the extent of influence of individual location-descriptive features, there is no need to transform the negative POIs. No statement is made or needed beforehand regarding whether a greater distance to these POIs generally has a positive or negative impact on the quality of a location. The term \"negative POIs\" is simply chosen because they are generally perceived as negative by humans. The logic and use of these POIs are therefore arbitrary in relation to the other POIs.8 To obtain shap values of the features of interest the shap package (https://shap.readthedocs.io/en/latest/index.html) is used.9 To ensure the reliability of results obtained from the newly introduced SHAP-LS across various model specifications, hyperparameter choices, and other potential variations, two distinct robustness checks are performed. The results can be seen in Appendix III.10 To verify that the SHAP-LS does not merely reflect the rental price per square meter, the correlation between the SHAP-LS and the rent per square meter is calculated. A coefficient of 0.58 signifies that the SHAP-LS effectively captures the relative attractiveness of a location, rather than solely relying only on the prices. The presence of a discernible correlation between the appeal of a location and the rental rates should not be regarded as unexpected.11 A corresponding example can be seen in Appendix IV.12 Another potential application for practitioners would be the further implementation of the SHAP-LS within an automated real estate valuation process to further improve their valuation accuracy. An empirical example of this can be found in Appendix V.13 As outlined in the methodology section, the SHAP-LS technique can be employed for location analysis without the necessity for scaling to gain further insights. An illustrative example of this approach which we call absolute SHAP-LS can be found in Appendix VI.14 In the context of the SHAP-LS, the correlation between different features can be seen as only a minor concern because the results are presented at an aggregated overall or group level. The individual interpretation challenges of specific features are thus avoided.15 Compared to the empirical example conducted in the upper part of the study, this illustrative demonstration utilized data not only from the year 2020 but also from the years 2021 and 2022. This allows for the implementation of the moving window approach and further enables a more robust testing of the results.16 To utilize the SHAP-LSs in the context of automated real estate valuation, they need to be computed beforehand. To enable a fair comparison, the calculation of SHAP-LSs for the test data is performed out-of-sample. In the first step, following the logic outlined in the methodology section, the SHAP-LSs are calculated for a training dataset. In the second step, the SHAP-LSs for the unseen test data are determined using a k-nearest neighbors approach (k = 5) based on the previously computed SHAP-LSs of the training data. Therefore, our approach can be seen as a feature selection and feature aggregation method, as it systematically identifies and incorporates the most relevant locational variables into the model. By concentrating on key features, our methodology minimizes the risk of overfitting, leading to models that potentially generalize better to new data.17 The NUTS (Nomenclature of territorial units for statistics) classification is a hierarchical system for dividing up the economic territory of the EU and the UK. Overall, there are four different subdivision levels, called NUTS-0, NUTS-1, NUTS-2 and NUTS-3. The NUTS-3 regions in the UK constitute local administrative units including counties, unitary authorities and London boroughs. For a more detailed about the NUTS regions, we refer to Krämer, Stang, et al. (Citation2023).18 The same logic is applied to calculate the SHAP-LSs.","PeriodicalId":51567,"journal":{"name":"Journal of Real Estate Research","volume":"52 1","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Real Estate Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/08965803.2023.2258012","RegionNum":4,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}

引用次数: 0

Abstract

AbstractBesides its structural and economic characteristics, the location of a property is probably one of the most important determinants of its underlying value. In contrast to property valuations, there are hardly any approaches to date that evaluate the quality of a real estate location in an automated manner. The reasons are the complexity, the number of interactions and the non-linearities underlying the quality specifications of a certain location. By combining a state-of-the-art machine learning algorithm and the local post-hoc model agnostic method of Shapley Additive Explanations, this paper introduces a newly developed approach – called SHAP location score – that is able to detect these complexities and enables assessing real estate locations in a data-based manner. The SHAP location score represents an intuitive and flexible approach based on econometric modeling techniques and the basic assumptions of hedonic pricing theory. The approach can be applied post-hoc to any common machine learning method and can be flexibly adapted to the respective needs. This constitutes a significant extension of traditional urban models and offers many advantages for a wide range of real estate players.Keywords: Location AnalyticsExplainable AIMachine LearningShapley ValuesAutomated LocationValuation Model Disclosure StatementNo potential conflict of interest was reported by the author(s).Notes1 This term describes the fact that this technique is applied after the actual training of an algorithm (= post-hoc) and can be applied for different algorithms (= model-agnostic).2 In the context of the SHAP-LS methodology, it is in principle possible to use both purchase or rental prices. Both reflect the observable willingness to pay for a property with certain characteristics and a certain location and can thus be used in this logic in an arbitrary manner.3 An example of the identification of aggregated results would be the Permutation Feature Importance (see e.g., Krämer, Nagl, et al., Citation2023).4 Theoretically, the SHAP-LS of single features could be used for this kind of analysis. However, this is not recommended, as one is exposed to the capriciousness of the algorithms and data providers. Often, locational features such as the distance to the next bus stop and to the next subway station correlate highly. Consequently, the algorithm cannot distinguish perfectly between these correlated features, which can lead to a blurring of the individual SHAP values. Another reason that should not be neglected is the dependence on the categorization of the location characteristics of the data providers. In some cases, individual amenities overlap considerably, e.g., the classification of restaurants, pubs or bars. Combining several individual characteristics into categories can counteract this blurring. As a rule of thumb, it can be stated that the more data available, the smaller the categories that can be used.5 It should be noted that the overall and categorical SHAP-LSs can also be utilized without scaling, providing absolute values instead of relative comparisons. This alternative application of the SHAP-LS methodology, the absolute SHAP-LS, leads to the derivation of absolute contributions of location-specific features, expressed in the unit of the property's price. This interpretation presents a quantifiable measure of the value placed on the location characteristics compared to the average prediction by market participants, thus providing an additional perspective for analysis. It also broadens the scope for data interpretation, thereby offering new ways for investigation within the realms of real estate valuation and location quality assessment. An exemplary implementation of this can be found in Appendix VI.6 A further application in other asset classes, such as retail, industrial, hotel, or office, is also theoretically possible but beyond the scope of this study.7 Since the SHAP-LS methodology is used to analyze the extent of influence of individual location-descriptive features, there is no need to transform the negative POIs. No statement is made or needed beforehand regarding whether a greater distance to these POIs generally has a positive or negative impact on the quality of a location. The term "negative POIs" is simply chosen because they are generally perceived as negative by humans. The logic and use of these POIs are therefore arbitrary in relation to the other POIs.8 To obtain shap values of the features of interest the shap package (https://shap.readthedocs.io/en/latest/index.html) is used.9 To ensure the reliability of results obtained from the newly introduced SHAP-LS across various model specifications, hyperparameter choices, and other potential variations, two distinct robustness checks are performed. The results can be seen in Appendix III.10 To verify that the SHAP-LS does not merely reflect the rental price per square meter, the correlation between the SHAP-LS and the rent per square meter is calculated. A coefficient of 0.58 signifies that the SHAP-LS effectively captures the relative attractiveness of a location, rather than solely relying only on the prices. The presence of a discernible correlation between the appeal of a location and the rental rates should not be regarded as unexpected.11 A corresponding example can be seen in Appendix IV.12 Another potential application for practitioners would be the further implementation of the SHAP-LS within an automated real estate valuation process to further improve their valuation accuracy. An empirical example of this can be found in Appendix V.13 As outlined in the methodology section, the SHAP-LS technique can be employed for location analysis without the necessity for scaling to gain further insights. An illustrative example of this approach which we call absolute SHAP-LS can be found in Appendix VI.14 In the context of the SHAP-LS, the correlation between different features can be seen as only a minor concern because the results are presented at an aggregated overall or group level. The individual interpretation challenges of specific features are thus avoided.15 Compared to the empirical example conducted in the upper part of the study, this illustrative demonstration utilized data not only from the year 2020 but also from the years 2021 and 2022. This allows for the implementation of the moving window approach and further enables a more robust testing of the results.16 To utilize the SHAP-LSs in the context of automated real estate valuation, they need to be computed beforehand. To enable a fair comparison, the calculation of SHAP-LSs for the test data is performed out-of-sample. In the first step, following the logic outlined in the methodology section, the SHAP-LSs are calculated for a training dataset. In the second step, the SHAP-LSs for the unseen test data are determined using a k-nearest neighbors approach (k = 5) based on the previously computed SHAP-LSs of the training data. Therefore, our approach can be seen as a feature selection and feature aggregation method, as it systematically identifies and incorporates the most relevant locational variables into the model. By concentrating on key features, our methodology minimizes the risk of overfitting, leading to models that potentially generalize better to new data.17 The NUTS (Nomenclature of territorial units for statistics) classification is a hierarchical system for dividing up the economic territory of the EU and the UK. Overall, there are four different subdivision levels, called NUTS-0, NUTS-1, NUTS-2 and NUTS-3. The NUTS-3 regions in the UK constitute local administrative units including counties, unitary authorities and London boroughs. For a more detailed about the NUTS regions, we refer to Krämer, Stang, et al. (Citation2023).18 The same logic is applied to calculate the SHAP-LSs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

改变位置游戏-在可解释的AI的帮助下改善位置分析

10为了验证SHAP-LS不仅仅反映每平方米的租金价格，我们计算了SHAP-LS与每平方米租金之间的相关性。0.58的系数表明，该指数有效地反映了一个地区的相对吸引力，而不是仅仅依赖于价格。一个地点的吸引力和租金之间存在明显的相互关系不应被认为是出乎意料的在附录IV.12中可以看到一个相应的例子。从业者的另一个潜在应用将是在自动房地产估值过程中进一步实施SHAP-LS，以进一步提高其估值准确性。这方面的一个经验例子可以在附录V.13中找到，正如方法论部分所概述的那样，可以使用SHAP-LS技术进行位置分析，而无需进行缩放以获得进一步的见解。这种方法的一个说明性的例子，我们称之为绝对的SHAP-LS，可以在附录VI.14中找到。在SHAP-LS的背景下，不同特征之间的相关性可以被视为一个次要的问题，因为结果是在汇总的整体或组级别上呈现的。这样就避免了对特定特征的个别解释挑战与研究上半部分的实证例子相比，本文的解释性论证不仅使用了2020年的数据，还使用了2021年和2022年的数据。这允许实现移动窗口方法，并进一步实现对结果的更健壮的测试为了在自动房地产估值的背景下使用shap - ls，需要事先计算它们。为了进行公平的比较，测试数据的shap - ls计算是在样本外进行的。在第一步中，按照方法学部分中概述的逻辑，为训练数据集计算shap - ls。在第二步中，基于先前计算的训练数据的shap - ls，使用k-最近邻方法(k = 5)确定未见测试数据的shap - ls。因此，我们的方法可以被看作是一种特征选择和特征聚合方法，因为它系统地识别并将最相关的位置变量纳入模型。通过专注于关键特征，我们的方法将过度拟合的风险降至最低，从而导致模型可能更好地概括新数据NUTS(统计领土单位命名法)分类是划分欧盟和英国经济领土的等级体系。总的来说，有四个不同的细分级别，称为NUTS-0、NUTS-1、NUTS-2和NUTS-3。联合王国的nut -3地区构成了地方行政单位，包括县、单一当局和伦敦自治市。关于NUTS区域的更详细信息，我们参考Krämer, Stang等(Citation2023).18同样的逻辑应用于计算shap - ls。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Real Estate Research Multiple-

CiteScore

1.40

自引率

12.50%

发文量

期刊介绍： The American Real Estate Society (ARES), founded in 1985, is an association of real estate thought leaders. Members are drawn from academia and the profession at large, both in the United States and internationally. The Society is dedicated to producing and disseminating knowledge related to real estate decision making and the functioning of real estate markets. The objectives of the American Real Estate Society are to encourage research and promote education in real estate, improve communication and exchange of information in real estate and allied matters among college/university faculty and practicing professionals, and facilitate the association of academic, practicing professional, and research persons in the area of real estate.