{"title":"基于递归神经网络的低复杂度视频分类","authors":"Ifat Abramovich, Tomer Ben-Yehuda, R. Cohen","doi":"10.1109/ICSEE.2018.8646076","DOIUrl":null,"url":null,"abstract":"Deep learning has led to great successes in computer vision tasks such as image classification. This is mostly attributed to the availability of large image datasets such as ImageNet. However, the progress in video classification has been slower, especially due to the small size of available video datasets and larger computational and memory demands. To promote innovation and advancement in this field, Google announced the YouTube-8M dataset in 2016, which is a public video dataset containing about 8-million tagged videos. In this paper, we train several deep neural networks for video classification on a subset of YouTube-8M. Our approach is based on extracting frame-level features using the Inception-v3 network, which are later used by recurrent neural networks with LSTM/BiLSTM units for video classification. We focus on network architectures with low computational requirements and present a detailed performance comparison. We show that for 5 categories, more than 96% of the videos are labeled correctly, where for 10 categories more than 89% of the videos are labeled correctly. We demonstrate that transfer learning leads to substantial saving in training time, while offering good results.","PeriodicalId":254455,"journal":{"name":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Low-Complexity Video Classification using Recurrent Neural Networks\",\"authors\":\"Ifat Abramovich, Tomer Ben-Yehuda, R. Cohen\",\"doi\":\"10.1109/ICSEE.2018.8646076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning has led to great successes in computer vision tasks such as image classification. This is mostly attributed to the availability of large image datasets such as ImageNet. However, the progress in video classification has been slower, especially due to the small size of available video datasets and larger computational and memory demands. To promote innovation and advancement in this field, Google announced the YouTube-8M dataset in 2016, which is a public video dataset containing about 8-million tagged videos. In this paper, we train several deep neural networks for video classification on a subset of YouTube-8M. Our approach is based on extracting frame-level features using the Inception-v3 network, which are later used by recurrent neural networks with LSTM/BiLSTM units for video classification. We focus on network architectures with low computational requirements and present a detailed performance comparison. We show that for 5 categories, more than 96% of the videos are labeled correctly, where for 10 categories more than 89% of the videos are labeled correctly. We demonstrate that transfer learning leads to substantial saving in training time, while offering good results.\",\"PeriodicalId\":254455,\"journal\":{\"name\":\"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)\",\"volume\":\"153 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSEE.2018.8646076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSEE.2018.8646076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Low-Complexity Video Classification using Recurrent Neural Networks
Deep learning has led to great successes in computer vision tasks such as image classification. This is mostly attributed to the availability of large image datasets such as ImageNet. However, the progress in video classification has been slower, especially due to the small size of available video datasets and larger computational and memory demands. To promote innovation and advancement in this field, Google announced the YouTube-8M dataset in 2016, which is a public video dataset containing about 8-million tagged videos. In this paper, we train several deep neural networks for video classification on a subset of YouTube-8M. Our approach is based on extracting frame-level features using the Inception-v3 network, which are later used by recurrent neural networks with LSTM/BiLSTM units for video classification. We focus on network architectures with low computational requirements and present a detailed performance comparison. We show that for 5 categories, more than 96% of the videos are labeled correctly, where for 10 categories more than 89% of the videos are labeled correctly. We demonstrate that transfer learning leads to substantial saving in training time, while offering good results.