Subhajit Das, Panpan Xu, Zeng Dai, A. Endert, Liu Ren
{"title":"Interpreting Deep Neural Networks through Prototype Factorization","authors":"Subhajit Das, Panpan Xu, Zeng Dai, A. Endert, Liu Ren","doi":"10.1109/ICDMW51313.2020.00068","DOIUrl":null,"url":null,"abstract":"Typical deep neural networks (DNNs) are complex black-box models and their decision making process can be difficult to comprehend even for experienced machine learning practitioners. Therefore their use could be limited in mission-critical scenarios despite state-of-the-art performance on many challenging ML tasks. Through this work, we empower users to interpret DNNs with a post-hoc analysis protocol. We propose ProtoFac, an explainable matrix factorization technique that decomposes the latent representations at any selected layer in a pre-trained DNN as a collection of weighted prototypes, which are a small number of exemplars extracted from the original data (e.g. image patches, shapelets). Using the factorized weights and prototypes we build a surrogate model for interpretation by replacing the corresponding layer in the neural network. We identify a number of desired properties of ProtoFac including authenticity, interpretability, simplicity and propose the optimization objective and training procedure accordingly. The method is model-agnostic and can be applied to DNNs with varying architectures. It goes beyond per-sample feature-based explanation by providing prototypes as a condensed set of evidences used by the model for decision making. We applied ProtoFac to interpret pretrained DNNs for a variety of ML tasks including time series classification on electrocardiograms, and image classification. The result shows that ProtoFac is able to extract meaningful prototypes to explain the models' decisions while truthfully reflects the models' operation. We also evaluated human interpretability through Amazon Mechanical Turk (MTurk), showing that ProtoFac is able to produce interpretable and user-friendly explanations.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Typical deep neural networks (DNNs) are complex black-box models and their decision making process can be difficult to comprehend even for experienced machine learning practitioners. Therefore their use could be limited in mission-critical scenarios despite state-of-the-art performance on many challenging ML tasks. Through this work, we empower users to interpret DNNs with a post-hoc analysis protocol. We propose ProtoFac, an explainable matrix factorization technique that decomposes the latent representations at any selected layer in a pre-trained DNN as a collection of weighted prototypes, which are a small number of exemplars extracted from the original data (e.g. image patches, shapelets). Using the factorized weights and prototypes we build a surrogate model for interpretation by replacing the corresponding layer in the neural network. We identify a number of desired properties of ProtoFac including authenticity, interpretability, simplicity and propose the optimization objective and training procedure accordingly. The method is model-agnostic and can be applied to DNNs with varying architectures. It goes beyond per-sample feature-based explanation by providing prototypes as a condensed set of evidences used by the model for decision making. We applied ProtoFac to interpret pretrained DNNs for a variety of ML tasks including time series classification on electrocardiograms, and image classification. The result shows that ProtoFac is able to extract meaningful prototypes to explain the models' decisions while truthfully reflects the models' operation. We also evaluated human interpretability through Amazon Mechanical Turk (MTurk), showing that ProtoFac is able to produce interpretable and user-friendly explanations.
典型的深度神经网络(dnn)是复杂的黑箱模型,即使对于经验丰富的机器学习从业者来说,它们的决策过程也很难理解。因此,尽管在许多具有挑战性的机器学习任务中具有最先进的性能,但它们在关键任务场景中的使用可能受到限制。通过这项工作,我们授权用户使用事后分析协议来解释dnn。我们提出ProtoFac,这是一种可解释的矩阵分解技术,它将预训练DNN中任何选定层的潜在表示分解为加权原型的集合,加权原型是从原始数据中提取的少量样本(例如图像补丁,shapelets)。利用分解的权重和原型,我们通过替换神经网络中的相应层来构建代理模型进行解释。我们确定了ProtoFac的一些期望属性,包括真实性、可解释性、简单性,并提出了相应的优化目标和训练程序。该方法是模型不可知的,可以应用于具有不同结构的dnn。它超越了基于每个样本特征的解释,通过提供原型作为模型用于决策的证据的浓缩集。我们应用ProtoFac来解释各种ML任务的预训练dnn,包括心电图的时间序列分类和图像分类。结果表明,ProtoFac能够提取有意义的原型来解释模型的决策,同时真实地反映模型的运行情况。我们还通过Amazon Mechanical Turk (MTurk)评估了人类的可解释性,表明ProtoFac能够产生可解释且用户友好的解释。