{"title":"Disentangling Convolutional Neural Network towards an explainable Vehicle Classifier","authors":"","doi":"10.1109/DICTA56598.2022.10034615","DOIUrl":null,"url":null,"abstract":"Vehicle category classification is an integral part of intelligent transportation systems (ITS). In this context, vision-based approaches are of increasing interest due to recent progress in camera hardware and machine learning algorithms. Currently, for vision-based classification an end-to-end approach based on Convolutional Neural Networks (CNNs) is the state-of-the-art. However, their inherent black-box approach and the difficulty of modifying existing or adding new categories currently limit their application in ITS. Here, we present an alternative classification approach that partially removes these limitations. It consists of three parts: 1) a CNN-based detector for semantically strong vehicle parts provides the basis for 2) a feature construction step, followed by 3) the final classification based on a decision tree. Ultimately this approach will allow to keep the training-intensive part-detector fixed, once a sufficiently large set of vehicle parts has been trained. Modification of existing categories and addition of new ones are possible by changes to the feature construction and classification steps only. We illustrate the effectiveness of this approach through the extension of the vehicle classifier from 11 to 16 categories by adding an “articulate” feature. In addition, the vehicle parts provide clear interpretability and the conceptually simple feature construction and decision tree classifier provide explainability of the approach. Nevertheless, the part-based classifier achieves comparable accuracy to an end-to-end CNN model trained on all 16 classes.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA56598.2022.10034615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Vehicle category classification is an integral part of intelligent transportation systems (ITS). In this context, vision-based approaches are of increasing interest due to recent progress in camera hardware and machine learning algorithms. Currently, for vision-based classification an end-to-end approach based on Convolutional Neural Networks (CNNs) is the state-of-the-art. However, their inherent black-box approach and the difficulty of modifying existing or adding new categories currently limit their application in ITS. Here, we present an alternative classification approach that partially removes these limitations. It consists of three parts: 1) a CNN-based detector for semantically strong vehicle parts provides the basis for 2) a feature construction step, followed by 3) the final classification based on a decision tree. Ultimately this approach will allow to keep the training-intensive part-detector fixed, once a sufficiently large set of vehicle parts has been trained. Modification of existing categories and addition of new ones are possible by changes to the feature construction and classification steps only. We illustrate the effectiveness of this approach through the extension of the vehicle classifier from 11 to 16 categories by adding an “articulate” feature. In addition, the vehicle parts provide clear interpretability and the conceptually simple feature construction and decision tree classifier provide explainability of the approach. Nevertheless, the part-based classifier achieves comparable accuracy to an end-to-end CNN model trained on all 16 classes.