Better Look Twice - Improving Visual Scene Perception Using a Two-Stage Approach

Christopher B. Kuhn, M. Hofbauer, G. Petrovic, E. Steinbach
{"title":"Better Look Twice - Improving Visual Scene Perception Using a Two-Stage Approach","authors":"Christopher B. Kuhn, M. Hofbauer, G. Petrovic, E. Steinbach","doi":"10.1109/ISM.2020.00013","DOIUrl":null,"url":null,"abstract":"Accurate visual scene perception plays an important role in fields such as medical imaging or autonomous driving. Recent advances in computer vision allow for accurate image classification, object detection and even pixel-wise semantic segmentation. Human vision has repeatedly been used as an inspiration for developing new machine vision approaches. In this work, we propose to adapt the “zoom lens model” from psychology for semantic scene segmentation. According to this model, humans first distribute their attention evenly across the entire field of view at low processing power. Then, they follow visual cues to look at a few smaller areas with increased attention. By looking twice, it is possible to refine the initial scene understanding without requiring additional input. We propose to perform semantic segmentation the same way. To obtain visual cues for deciding where to look twice, we use a failure region prediction approach based on a state-of-the-art failure prediction method. Then, the second, focused look is performed by a dedicated classifier that reclassifies the most challenging patches. Finally, pixels predicted to be errors are updated in the original semantic prediction. While focusing only on areas with the highest predicted failure probability, we achieve a classification accuracy of over 63% for the predicted failure regions. After updating the initial semantic prediction of 4000 test images from a large-scale driving data set, we reduce the absolute pixel-wise error of 232 road participants by 10% or more.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Multimedia (ISM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Accurate visual scene perception plays an important role in fields such as medical imaging or autonomous driving. Recent advances in computer vision allow for accurate image classification, object detection and even pixel-wise semantic segmentation. Human vision has repeatedly been used as an inspiration for developing new machine vision approaches. In this work, we propose to adapt the “zoom lens model” from psychology for semantic scene segmentation. According to this model, humans first distribute their attention evenly across the entire field of view at low processing power. Then, they follow visual cues to look at a few smaller areas with increased attention. By looking twice, it is possible to refine the initial scene understanding without requiring additional input. We propose to perform semantic segmentation the same way. To obtain visual cues for deciding where to look twice, we use a failure region prediction approach based on a state-of-the-art failure prediction method. Then, the second, focused look is performed by a dedicated classifier that reclassifies the most challenging patches. Finally, pixels predicted to be errors are updated in the original semantic prediction. While focusing only on areas with the highest predicted failure probability, we achieve a classification accuracy of over 63% for the predicted failure regions. After updating the initial semantic prediction of 4000 test images from a large-scale driving data set, we reduce the absolute pixel-wise error of 232 road participants by 10% or more.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
更好地看两次-使用两阶段方法改善视觉场景感知
准确的视觉场景感知在医学成像或自动驾驶等领域发挥着重要作用。计算机视觉的最新进展允许精确的图像分类,目标检测甚至逐像素语义分割。人类视觉已经多次被用作开发新的机器视觉方法的灵感。在这项工作中,我们提出将心理学中的“变焦镜头模型”用于语义场景分割。根据这个模型,人类首先在低处理能力下将注意力均匀地分布在整个视野上。然后,他们根据视觉线索,集中注意力观察几个较小的区域。通过查看两次,可以在不需要额外输入的情况下改进初始场景理解。我们建议以同样的方式执行语义分割。为了获得决定在哪里看两次的视觉线索,我们使用了基于最先进的故障预测方法的故障区域预测方法。然后,由一个专门的分类器执行第二次集中查看,该分类器对最具挑战性的补丁进行重新分类。最后,在原始语义预测中更新被预测为错误的像素。虽然只关注预测失效概率最高的区域,但我们对预测失效区域的分类准确率超过63%。在更新了来自大规模驾驶数据集的4000张测试图像的初始语义预测后,我们将232名道路参与者的绝对像素误差降低了10%或更多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Structured Pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications Real-Time Detection of Events in Soccer Videos using 3D Convolutional Neural Networks Audio Captioning Based on Combined Audio and Semantic Embeddings Two types of flows admission control method for maximizing all user satisfaction considering seek-bar operation Better Look Twice - Improving Visual Scene Perception Using a Two-Stage Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1