视听CNN利用迁移学习进行电视广告插播检测

IJCCS Indonesian Journal of Computing and Cybernetics Systems Pub Date : 2023-07-31 DOI:10.22146/ijccs.76058

Muhammad Zha'farudin Pudya Wardana, Moh. Edi Wibowo

{"title":"视听CNN利用迁移学习进行电视广告插播检测","authors":"Muhammad Zha'farudin Pudya Wardana, Moh. Edi Wibowo","doi":"10.22146/ijccs.76058","DOIUrl":null,"url":null,"abstract":"The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy. This research uses transfer learning techniques to reduce training time and limits the number of training epochs to 20. From video data, the audio feature is extracted with Mel-spectrogram representation, and the visual features are picked from a video frame. The datasets were gathered by recording programs from various TV channels in Indonesia. Pre-trained CNN models such as MobileNetV2, InceptionV3, and DenseNet169 are re-trained and are used to detect commercials at the shot level. We do post-processing to cluster the shots into segments of commercials and non-commercials. The best result is shown by Audio-Visual CNN using transfer learning with an accuracy of 93.26% with only 20 training epochs. It is faster and better than the CNN model without using transfer learning with an accuracy of 88.17% and 77 training epochs. The result by adding post-processing increases the accuracy of Audio-Visual CNN using transfer learning to 96.42%.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection\",\"authors\":\"Muhammad Zha'farudin Pudya Wardana, Moh. Edi Wibowo\",\"doi\":\"10.22146/ijccs.76058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy. This research uses transfer learning techniques to reduce training time and limits the number of training epochs to 20. From video data, the audio feature is extracted with Mel-spectrogram representation, and the visual features are picked from a video frame. The datasets were gathered by recording programs from various TV channels in Indonesia. Pre-trained CNN models such as MobileNetV2, InceptionV3, and DenseNet169 are re-trained and are used to detect commercials at the shot level. We do post-processing to cluster the shots into segments of commercials and non-commercials. The best result is shown by Audio-Visual CNN using transfer learning with an accuracy of 93.26% with only 20 training epochs. It is faster and better than the CNN model without using transfer learning with an accuracy of 88.17% and 77 training epochs. The result by adding post-processing increases the accuracy of Audio-Visual CNN using transfer learning to 96.42%.\",\"PeriodicalId\":31625,\"journal\":{\"name\":\"IJCCS Indonesian Journal of Computing and Cybernetics Systems\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IJCCS Indonesian Journal of Computing and Cybernetics Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22146/ijccs.76058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/ijccs.76058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于节目和频道的多样性，电视广告的检测问题是一个严峻的挑战。使用深度学习方法来解决这个问题已经显示出良好的效果。然而，该方法需要长时间、多次训练才能达到较高的准确率。本研究使用迁移学习技术来减少训练时间，并将训练次数限制在20次以内。从视频数据中，用梅尔谱图表示提取音频特征，并从视频帧中提取视觉特征。这些数据集是通过录制印度尼西亚各个电视频道的节目收集的。预先训练的CNN模型，如MobileNetV2, InceptionV3和DenseNet169被重新训练，并用于在镜头级别检测商业广告。我们做后期处理，把镜头分成商业和非商业的片段。使用迁移学习的视听CNN效果最好，只需要20个训练epoch，准确率达到93.26%。它比未使用迁移学习的CNN模型更快更好，准确率为88.17%，训练周期为77个。加入后处理后，视听CNN迁移学习的准确率提高到96.42%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection

The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy. This research uses transfer learning techniques to reduce training time and limits the number of training epochs to 20. From video data, the audio feature is extracted with Mel-spectrogram representation, and the visual features are picked from a video frame. The datasets were gathered by recording programs from various TV channels in Indonesia. Pre-trained CNN models such as MobileNetV2, InceptionV3, and DenseNet169 are re-trained and are used to detect commercials at the shot level. We do post-processing to cluster the shots into segments of commercials and non-commercials. The best result is shown by Audio-Visual CNN using transfer learning with an accuracy of 93.26% with only 20 training epochs. It is faster and better than the CNN model without using transfer learning with an accuracy of 88.17% and 77 training epochs. The result by adding post-processing increases the accuracy of Audio-Visual CNN using transfer learning to 96.42%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IJCCS Indonesian Journal of Computing and Cybernetics Systems

自引率

0.00%

发文量

审稿时长

12 weeks