深度学习和交互性视频陀螺成像

2020 IEEE International Conference on Image Processing (ICIP) Pub Date : 2020-10-01 DOI:10.1109/ICIP40778.2020.9191057

Shivam Saboo, F. Lefèbvre, Vincent Demoulin

{"title":"深度学习和交互性视频陀螺成像","authors":"Shivam Saboo, F. Lefèbvre, Vincent Demoulin","doi":"10.1109/ICIP40778.2020.9191057","DOIUrl":null,"url":null,"abstract":"In this work we extend the idea of object co-segmentation [10] to perform interactive video segmentation. Our framework predicts the coordinates of vertices along the boundary of an object for two frames of a video simultaneously. The predicted vertices are interactive in nature and a user interaction on one frame assists the network to correct the predictions for both frames. We employ attention mechanism at the encoder stage and a simple combination network at the decoder stage which allows the network to perform this simultaneous correction efficiently. The framework is also robust to the distance between the two input frames as it can handle a distance of up to 50 frames in between the two inputs.We train our model on professional dataset, which consists pixel accurate annotations given by professional Roto artists. We test our model on DAVIS [15] and achieve state of the art results in both automatic and interactive mode surpassing Curve-GCN [11] and PolyRNN++ [1].","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Deep Learning And Interactivity For Video Rotoscoping\",\"authors\":\"Shivam Saboo, F. Lefèbvre, Vincent Demoulin\",\"doi\":\"10.1109/ICIP40778.2020.9191057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we extend the idea of object co-segmentation [10] to perform interactive video segmentation. Our framework predicts the coordinates of vertices along the boundary of an object for two frames of a video simultaneously. The predicted vertices are interactive in nature and a user interaction on one frame assists the network to correct the predictions for both frames. We employ attention mechanism at the encoder stage and a simple combination network at the decoder stage which allows the network to perform this simultaneous correction efficiently. The framework is also robust to the distance between the two input frames as it can handle a distance of up to 50 frames in between the two inputs.We train our model on professional dataset, which consists pixel accurate annotations given by professional Roto artists. We test our model on DAVIS [15] and achieve state of the art results in both automatic and interactive mode surpassing Curve-GCN [11] and PolyRNN++ [1].\",\"PeriodicalId\":405734,\"journal\":{\"name\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP40778.2020.9191057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP40778.2020.9191057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在这项工作中，我们扩展了对象共分割[10]的思想来进行交互式视频分割。我们的框架可以同时预测两帧视频中沿物体边界的顶点坐标。预测的顶点本质上是交互式的，用户在一帧上的交互有助于网络纠正对两帧的预测。我们在编码器阶段采用注意机制，在解码器阶段采用简单的组合网络，使网络能够有效地执行这种同步校正。该框架对两个输入帧之间的距离也具有鲁棒性，因为它可以处理两个输入帧之间最多50帧的距离。我们在专业数据集上训练我们的模型，该数据集由专业Roto艺术家给出的像素精确的注释组成。我们在DAVIS[15]上测试了我们的模型，并在自动和交互模式下获得了超越Curve-GCN[11]和PolyRNN++[1]的最新结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deep Learning And Interactivity For Video Rotoscoping

In this work we extend the idea of object co-segmentation [10] to perform interactive video segmentation. Our framework predicts the coordinates of vertices along the boundary of an object for two frames of a video simultaneously. The predicted vertices are interactive in nature and a user interaction on one frame assists the network to correct the predictions for both frames. We employ attention mechanism at the encoder stage and a simple combination network at the decoder stage which allows the network to perform this simultaneous correction efficiently. The framework is also robust to the distance between the two input frames as it can handle a distance of up to 50 frames in between the two inputs.We train our model on professional dataset, which consists pixel accurate annotations given by professional Roto artists. We test our model on DAVIS [15] and achieve state of the art results in both automatic and interactive mode surpassing Curve-GCN [11] and PolyRNN++ [1].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量