WinDB: HMD-Free and Distortion-Free Panoptic Video Fixation Learning

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-12-04 DOI:10.1109/TPAMI.2024.3510793

Guotao Wang;Chenglizhao Chen;Aimin Hao;Hong Qin;Deng-Ping Fan

{"title":"WinDB: HMD-Free and Distortion-Free Panoptic Video Fixation Learning","authors":"Guotao Wang;Chenglizhao Chen;Aimin Hao;Hong Qin;Deng-Ping Fan","doi":"10.1109/TPAMI.2024.3510793","DOIUrl":null,"url":null,"abstract":"To date, the widely adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where users’ fixations are collected while wearing a HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist “blind zooms” when using HMD to collect fixations since the users cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the “blind zooms”. Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance—the main purpose of fixations—of complex panoptic scenes. To conquer, this paper introduces the auxiliary window with a dynamic blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is able to well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Specifically, since using WinDB to collect fixations is blind zoom free, there exists frequent and intensive “fixation shifting”—a very special phenomenon that has long been overlooked by the previous research—in our new set. Thus, we present an effective fixation shifting network (FishNet) to conquer it. All these new fixation collection tool, dataset, and network could be very potential to open a new age for fixation-related research and applications in <inline-formula><tex-math>$360^\\mathrm{o}$</tex-math></inline-formula> environments.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1694-1713"},"PeriodicalIF":18.6000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10777547/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

To date, the widely adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where users’ fixations are collected while wearing a HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist “blind zooms” when using HMD to collect fixations since the users cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the “blind zooms”. Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance—the main purpose of fixations—of complex panoptic scenes. To conquer, this paper introduces the auxiliary window with a dynamic blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is able to well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Specifically, since using WinDB to collect fixations is blind zoom free, there exists frequent and intensive “fixation shifting”—a very special phenomenon that has long been overlooked by the previous research—in our new set. Thus, we present an effective fixation shifting network (FishNet) to conquer it. All these new fixation collection tool, dataset, and network could be very potential to open a new age for fixation-related research and applications in

$360^\mathrm{o}$

environments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

WinDB：无hmd和无失真的全景视频固定学习

迄今为止，在全景视频中广泛采用的注视收集方式是基于头戴式显示器（HMD），用户戴着头戴式显示器（HMD）收集注视，自由地探索给定的全景场景。然而，这种广泛使用的数据收集方法不足以训练深度模型，以准确预测给定全景中包含间歇性显著事件时哪些区域是最重要的。主要原因是在使用HMD收集注视点时，由于用户无法一直旋转头部来探索整个全景场景，因此总是存在“盲变焦”。因此，收集到的注视点往往被困在一些局部视图中，剩下的区域成为“盲点”。因此，使用基于hmd的方法收集的固定数据积累了局部视图，不能准确地代表复杂全景场景的整体重要性——固定的主要目的。为了解决这一问题，本文引入了一种不需要HMD且能很好地反映区域重要性的全光视频动态模糊（WinDB）注视采集辅助窗口方法。使用我们的WinDB方法，我们发布了一个新的PanopticVideo-300数据集，包含300个全景剪辑，涵盖225多个类别。具体来说，由于使用WinDB收集注视点是无盲目缩放的，因此在我们的新集合中存在频繁而密集的“注视点移位”——这是一种非常特殊的现象，长期以来被以往的研究所忽视。因此，我们提出了一种有效的固定移动网络（渔网）来克服它。所有这些新的注视收集工具、数据集和网络都很有可能为注视相关的研究和应用在$360^\ mathm {o}$环境中打开一个新的时代。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

期刊最新文献

Spike Camera Optical Flow Estimation Based on Continuous Spike Streams. Bi-C²R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification. Interacted Planes Reveal 3D Line Mapping. Track-On2: Enhancing Online Point Tracking with Memory. Benchmarking Semantic Segmentation Models via Appearance and Geometry Attribute Editing.