CNN-based Driver Activity Understanding: Shedding Light on Deep Spatiotemporal Representations

Alina Roitberg, Monica Haurilet, Simon Reiß, R. Stiefelhagen
{"title":"CNN-based Driver Activity Understanding: Shedding Light on Deep Spatiotemporal Representations","authors":"Alina Roitberg, Monica Haurilet, Simon Reiß, R. Stiefelhagen","doi":"10.1109/ITSC45102.2020.9294731","DOIUrl":null,"url":null,"abstract":"While deep Convolutional Neural Networks(CNNs) have become front-runners in the field of driver observation, they are often perceived as black boxes due to their end-to-end nature. Interpretability of such models is vital for building trust and is a serious concern for the integration of CNNs in real-life systems. In this paper, we implement a diagnostic framework for analyzing such models internally and shed light on the learned spatiotemporal representations in a comprehensive study. We examine prominent driver monitoring models from three points of view: (1) visually explaining the prediction by combining the gradient with respect to the intermediate features and the corresponding activation maps, (2) looking at what the network has learned by clustering the internal representations and discovering, how individual classes relate at the feature-level, and (3) conducting a detailed failure analysis with multiple metrics and evaluation settings (e.g. common versus rare behaviors). Among our findings, we show that most of the mistakes can be traced back to learning an object- or a specific movement bias, strong semantic similarity between classes (e.g. preparing food and eating) and underrepresentation in the training set. Besides, we demonstrate the advantages of the Inflated 3D Net compared to other CNNs as it results in more discriminative embedding clusters and in the highest recognition rates based on all metrics.","PeriodicalId":394538,"journal":{"name":"2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC45102.2020.9294731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

While deep Convolutional Neural Networks(CNNs) have become front-runners in the field of driver observation, they are often perceived as black boxes due to their end-to-end nature. Interpretability of such models is vital for building trust and is a serious concern for the integration of CNNs in real-life systems. In this paper, we implement a diagnostic framework for analyzing such models internally and shed light on the learned spatiotemporal representations in a comprehensive study. We examine prominent driver monitoring models from three points of view: (1) visually explaining the prediction by combining the gradient with respect to the intermediate features and the corresponding activation maps, (2) looking at what the network has learned by clustering the internal representations and discovering, how individual classes relate at the feature-level, and (3) conducting a detailed failure analysis with multiple metrics and evaluation settings (e.g. common versus rare behaviors). Among our findings, we show that most of the mistakes can be traced back to learning an object- or a specific movement bias, strong semantic similarity between classes (e.g. preparing food and eating) and underrepresentation in the training set. Besides, we demonstrate the advantages of the Inflated 3D Net compared to other CNNs as it results in more discriminative embedding clusters and in the highest recognition rates based on all metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于cnn的驾驶员活动理解:揭示深层时空表征
虽然深度卷积神经网络(cnn)已经成为驾驶员观察领域的领跑者,但由于其端到端特性,它们通常被视为黑盒子。这些模型的可解释性对于建立信任至关重要,也是cnn在现实生活系统中集成的一个严重问题。在本文中,我们实现了一个诊断框架来分析这些模型,并在一个全面的研究中阐明了学习的时空表征。我们从三个角度研究了著名的驾驶员监控模型:(1)通过将梯度与中间特征和相应的激活图相结合,直观地解释预测结果;(2)通过对内部表征和发现进行聚类,查看网络已经学习到什么,以及各个类在特征级别上是如何关联的;(3)使用多个指标和评估设置(例如常见行为与罕见行为)进行详细的故障分析。在我们的研究结果中,我们表明大多数错误可以追溯到学习对象-或特定的运动偏差,类之间的强语义相似性(例如准备食物和进食)以及训练集中的代表性不足。此外,我们还展示了与其他cnn相比,充气3D网络的优势,因为它产生了更具判别性的嵌入聚类,并且基于所有指标的识别率最高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CR-TMS: Connected Vehicles enabled Road Traffic Congestion Mitigation System using Virtual Road Capacity Inflation A novel concept for validation of pre-crash perception sensor information using contact sensor Space-time Map based Path Planning Scheme in Large-scale Intelligent Warehouse System Weakly-supervised Road Condition Classification Using Automatically Generated Labels Studying the Impact of Public Transport on Disaster Evacuation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1