{"title":"Video object segmentation by Multi-Scale Pyramidal Multi-Dimensional LSTM with generated depth context","authors":"Qiurui Wang, C. Yuan","doi":"10.1109/ICIP.2016.7532363","DOIUrl":null,"url":null,"abstract":"Existing deep neural networks, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), typically treat volumetric video data as several single images and deal with one frame at one time, thus the relevance to frames can hardly be fully exploited. Besides, depth context plays the unique role in motion scenes for primates, but is seldom used in no depth label situations. In this paper, we use a more suitable architecture Multi-Scale Pyramidal Multi-Dimensional Long Short Term Memory (MSPMD-LSTM) to reveal the strong relevance within video frames. Furthermore, depth context is extracted and refined to enhance the performance of the model. Experiments demonstrate that our models yield competitive results on Youtube-Objects dataset and Segtrack v2 dataset.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"115 1","pages":"281-285"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP.2016.7532363","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Existing deep neural networks, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), typically treat volumetric video data as several single images and deal with one frame at one time, thus the relevance to frames can hardly be fully exploited. Besides, depth context plays the unique role in motion scenes for primates, but is seldom used in no depth label situations. In this paper, we use a more suitable architecture Multi-Scale Pyramidal Multi-Dimensional Long Short Term Memory (MSPMD-LSTM) to reveal the strong relevance within video frames. Furthermore, depth context is extracted and refined to enhance the performance of the model. Experiments demonstrate that our models yield competitive results on Youtube-Objects dataset and Segtrack v2 dataset.