{"title":"Tackling the Accuracy-Interpretability Trade-off: Interpretable Deep Learning Models for Satellite Image-based Real Estate Appraisal","authors":"Jan-Peter Kucklick, Oliver Müller","doi":"10.1145/3567430","DOIUrl":null,"url":null,"abstract":"Deep learning models fuel many modern decision support systems, because they typically provide high predictive performance. Among other domains, deep learning is used in real-estate appraisal, where it allows extending the analysis from hard facts only (e.g., size, age) to also consider more implicit information about the location or appearance of houses in the form of image data. However, one downside of deep learning models is their intransparent mechanic of decision making, which leads to a trade-off between accuracy and interpretability. This limits their applicability for tasks where a justification of the decision is necessary. Therefore, in this article, we first combine different perspectives on interpretability into a multi-dimensional framework for a socio-technical perspective on explainable artificial intelligence. Second, we measure the performance gains of using multi-view deep learning, which leverages additional image data (satellite images) for real estate appraisal. Third, we propose and test a novel post hoc explainability method called Grad-Ram. This modified version of Grad-Cam mitigates the intransparency of convolutional neural networks for predicting continuous outcome variables. With this, we try to reduce the accuracy-interpretability trade-off of multi-view deep learning models. Our proposed network architecture outperforms traditional hedonic regression models by 34% in terms of MAE. Furthermore, we find that the used satellite images are the second most important predictor after square feet in our model and that the network learns interpretable patterns about the neighborhood structure and density.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":"14 1","pages":"1 - 24"},"PeriodicalIF":2.5000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Management Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3567430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
Deep learning models fuel many modern decision support systems, because they typically provide high predictive performance. Among other domains, deep learning is used in real-estate appraisal, where it allows extending the analysis from hard facts only (e.g., size, age) to also consider more implicit information about the location or appearance of houses in the form of image data. However, one downside of deep learning models is their intransparent mechanic of decision making, which leads to a trade-off between accuracy and interpretability. This limits their applicability for tasks where a justification of the decision is necessary. Therefore, in this article, we first combine different perspectives on interpretability into a multi-dimensional framework for a socio-technical perspective on explainable artificial intelligence. Second, we measure the performance gains of using multi-view deep learning, which leverages additional image data (satellite images) for real estate appraisal. Third, we propose and test a novel post hoc explainability method called Grad-Ram. This modified version of Grad-Cam mitigates the intransparency of convolutional neural networks for predicting continuous outcome variables. With this, we try to reduce the accuracy-interpretability trade-off of multi-view deep learning models. Our proposed network architecture outperforms traditional hedonic regression models by 34% in terms of MAE. Furthermore, we find that the used satellite images are the second most important predictor after square feet in our model and that the network learns interpretable patterns about the neighborhood structure and density.