Driving Behavior Aware Caption Generation for Egocentric Driving Videos Using In-Vehicle Sensors*

2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops) Pub Date : 2021-07-11 DOI:10.1109/ivworkshops54471.2021.9669259

Hongkuan Zhang, Koichi Takeda, Ryohei Sasano, Yusuke Adachi, Kento Ohtani

{"title":"Driving Behavior Aware Caption Generation for Egocentric Driving Videos Using In-Vehicle Sensors*","authors":"Hongkuan Zhang, Koichi Takeda, Ryohei Sasano, Yusuke Adachi, Kento Ohtani","doi":"10.1109/ivworkshops54471.2021.9669259","DOIUrl":null,"url":null,"abstract":"Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.","PeriodicalId":256905,"journal":{"name":"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ivworkshops54471.2021.9669259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于车载传感器的自我中心驾驶视频的驾驶行为感知字幕生成*

视频字幕的目的是根据视频内容生成文本描述。自动驾驶汽车的风险评估对于保险公司提供足够的保险范围至关重要，特别是对于新兴的MaaS业务。保险公司需要分析行车记录仪记录的视频和传感器信号等大量驾驶数据，评估固定路线的自动驾驶事业计划的风险。为了提高流程效率，为驾驶视频生成字幕可以为保险公司提供简明的信息，以便快速理解视频内容。驾驶视频字幕的一个自然问题是，由于在这些以自我为中心的视频中没有自我车辆，对潜在驾驶行为的描述很难建立在特定的视觉线索上。为了解决这一问题，我们将重点放在生成具有准确行为描述的驾驶视频字幕上，并提出在车内集成封装驾驶行为信息的传感器来辅助字幕生成。我们在名为City Traffic的日本驾驶视频字幕数据集上评估了我们的方法，结果证明了车载传感器在提高生成字幕的整体性能方面的有效性，特别是在生成更准确的驾驶行为描述方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)

自引率

0.00%

发文量