一种高效的大规模视频类型分类框架

Ning Zhang, L. Guan
{"title":"一种高效的大规模视频类型分类框架","authors":"Ning Zhang, L. Guan","doi":"10.1109/MMSP.2010.5662069","DOIUrl":null,"url":null,"abstract":"Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An efficient framework on large-scale video genre classification\",\"authors\":\"Ning Zhang, L. Guan\",\"doi\":\"10.1109/MMSP.2010.5662069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.\",\"PeriodicalId\":105774,\"journal\":{\"name\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMSP.2010.5662069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2010.5662069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

高效的数据挖掘和索引对多媒体分析和检索具有重要意义。在大规模视频分析领域,有效的类型分类扮演着重要的角色,是数据挖掘的基础步骤之一。现有的工作采用依赖领域知识的特征提取,受类型多样化和数据量可扩展性的限制。在本文中,我们提出了一个系统的框架,在特征提取中使用领域知识无关的描述符自动分类视频类型,在紧凑视频表示中使用基于视觉词袋(BoW)的模型。采用GPU硬件加速的尺度不变特征变换(SIFT)局部描述子进行特征提取。提出了一种基于自底向上两层k均值聚类的编码本生成BoW模型来抽象视频特征。除了基于直方图的视频数据汇总分布外,还引入了一种改进的基于潜在狄利克雷分布(mLDA)的视频数据汇总分布。在分类阶段,使用k近邻(k-NN)分类器。与[1]中最先进的大规模类型分类相比,在23个运动数据集上的实验结果表明,我们提出的框架在数据量和多样性上分别扩展了27%和64%,达到了相当的分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An efficient framework on large-scale video genre classification
Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Probabilistic framework for template-based chord recognition A comparative study between different pre-whitening decorrelation based acoustic feedback cancellers Efficient error control in 3D mesh coding An improved foresighted resource reciprocation strategy for multimedia streaming applications Fusion of active and passive sensors for fast 3D capture
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1