Video Instance Shadow Detection Under the Sun and Sky

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-10-02 DOI:10.1109/TIP.2024.3468877

Zhenghao Xing;Tianyu Wang;Xiaowei Hu;Haoran Wu;Chi-Wing Fu;Pheng-Ann Heng

{"title":"Video Instance Shadow Detection Under the Sun and Sky","authors":"Zhenghao Xing;Tianyu Wang;Xiaowei Hu;Haoran Wu;Chi-Wing Fu;Pheng-Ann Heng","doi":"10.1109/TIP.2024.3468877","DOIUrl":null,"url":null,"abstract":"Instance shadow detection, crucial for applications such as photo editing and light direction estimation, has undergone significant advancements in predicting shadow instances, object instances, and their associations. The extension of this task to videos presents challenges in annotating diverse video data and addressing complexities arising from occlusion and temporary disappearances within associations. In response to these challenges, we introduce ViShadow, a semi-supervised video instance shadow detection framework that leverages both labeled image data and unlabeled video data for training. ViShadow features a two-stage training pipeline: the first stage, utilizing labeled image data, identifies shadow and object instances through contrastive learning for cross-frame pairing. The second stage employs unlabeled videos, incorporating an associated cycle consistency loss to enhance tracking ability. A retrieval mechanism is introduced to manage temporary disappearances, ensuring tracking continuity. The SOBA-VID dataset, comprising unlabeled training videos and labeled testing videos, along with the SOAP-VID metric, is introduced for the quantitative evaluation of VISD solutions. The effectiveness of ViShadow is further demonstrated through various video-level applications such as video inpainting, instance cloning, shadow editing, and text-instructed shadow-object manipulation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5715-5726"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10704578/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Instance shadow detection, crucial for applications such as photo editing and light direction estimation, has undergone significant advancements in predicting shadow instances, object instances, and their associations. The extension of this task to videos presents challenges in annotating diverse video data and addressing complexities arising from occlusion and temporary disappearances within associations. In response to these challenges, we introduce ViShadow, a semi-supervised video instance shadow detection framework that leverages both labeled image data and unlabeled video data for training. ViShadow features a two-stage training pipeline: the first stage, utilizing labeled image data, identifies shadow and object instances through contrastive learning for cross-frame pairing. The second stage employs unlabeled videos, incorporating an associated cycle consistency loss to enhance tracking ability. A retrieval mechanism is introduced to manage temporary disappearances, ensuring tracking continuity. The SOBA-VID dataset, comprising unlabeled training videos and labeled testing videos, along with the SOAP-VID metric, is introduced for the quantitative evaluation of VISD solutions. The effectiveness of ViShadow is further demonstrated through various video-level applications such as video inpainting, instance cloning, shadow editing, and text-instructed shadow-object manipulation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

太阳和天空下的阴影检测视频实例。

阴影实例检测对照片编辑和光照方向估计等应用至关重要，在预测阴影实例、物体实例及其关联方面取得了重大进展。将这一任务扩展到视频中，在注释不同的视频数据以及解决遮挡和关联中的暂时消失所带来的复杂性方面提出了挑战。为了应对这些挑战，我们推出了 ViShadow，这是一种半监督式视频实例阴影检测框架，可同时利用已标注图像数据和未标注视频数据进行训练。ViShadow 采用两阶段训练管道：第一阶段利用标记图像数据，通过跨帧配对的对比学习来识别阴影和物体实例。第二阶段利用未标记的视频，结合相关的周期一致性损失来增强跟踪能力。此外，还引入了一种检索机制来管理临时消失，确保跟踪的连续性。SOBA-VID 数据集包括未标记的训练视频和已标记的测试视频以及 SOAP-VID 指标，用于对 VISD 解决方案进行定量评估。通过各种视频级应用，如视频内画、实例克隆、阴影编辑和文本指示阴影对象操作，进一步证明了 ViShadow 的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量