首页 > 最新文献

2012 IEEE International Symposium on Multimedia最新文献

英文 中文
Efficient Control of PTZ Cameras in Automated Video Surveillance Systems 自动视频监控系统中PTZ摄像机的有效控制
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.72
Musab S. Al-Hadrusi, Nabil J. Sarhan
This paper deals with the camera control problem in automated video surveillance. We develop a solution that seeks to optimize the overall subject recognition probability by controlling the pan, tilt, and zoom of various deployed Pan/Tilt/Zoom (PTZ) cameras. Since the number of subjects is usually much larger than the number of video cameras, the problem to be addressed is how to assign subjects to these cameras. This control of cameras is based on the direction of the subject's movement and its location, distances from the cameras, occlusion, overall recognition probability so far, and the expected time to leave the site, as well as the movements of cameras and their capabilities and limitations. The developed solution works with realistic 3D environments and not just 2D scenes. We analyze the effectiveness of the proposed solution through extensive simulation.
本文研究了自动视频监控中的摄像机控制问题。我们开发了一种解决方案,旨在通过控制各种部署的平移/倾斜/变焦(PTZ)相机的平移、倾斜和变焦来优化整体对象识别概率。由于拍摄对象的数量通常远远大于摄像机的数量,因此要解决的问题是如何将拍摄对象分配给这些摄像机。摄像机的这种控制是基于对象的运动方向和位置,与摄像机的距离,遮挡,到目前为止的总体识别概率,预计离开现场的时间,以及摄像机的运动及其能力和局限性。开发的解决方案适用于逼真的3D环境,而不仅仅是2D场景。我们通过广泛的仿真分析了所提出的解决方案的有效性。
{"title":"Efficient Control of PTZ Cameras in Automated Video Surveillance Systems","authors":"Musab S. Al-Hadrusi, Nabil J. Sarhan","doi":"10.1109/ISM.2012.72","DOIUrl":"https://doi.org/10.1109/ISM.2012.72","url":null,"abstract":"This paper deals with the camera control problem in automated video surveillance. We develop a solution that seeks to optimize the overall subject recognition probability by controlling the pan, tilt, and zoom of various deployed Pan/Tilt/Zoom (PTZ) cameras. Since the number of subjects is usually much larger than the number of video cameras, the problem to be addressed is how to assign subjects to these cameras. This control of cameras is based on the direction of the subject's movement and its location, distances from the cameras, occlusion, overall recognition probability so far, and the expected time to leave the site, as well as the movements of cameras and their capabilities and limitations. The developed solution works with realistic 3D environments and not just 2D scenes. We analyze the effectiveness of the proposed solution through extensive simulation.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121274269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Interframe Coding of Canonical Patches for Mobile Augmented Reality 移动增强现实规范补丁的帧间编码
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.18
Mina Makar, Sam S. Tsai, V. Chandrasekhar, David M. Chen, B. Girod
Local features are widely used for content-based image retrieval and augmented reality applications. Typically, feature descriptors are calculated from the gradients of a canonical patch around a repeatable key point in the image. In previous work, we showed that one can alternatively transmit the compressed canonical patch and perform descriptor computation at the receiving end with comparable performance. In this paper, we propose a temporally coherent key point detector in order to allow efficient interframe coding of canonical patches. In inter-patch compression, one strives to transmit each patch with as few bits as possible by simply modifying a previously transmitted patch. This enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for the image-based retrieval, can be sent over a wireless link at the smallest possible bit-rate. Experimental results show that our technique achieves a similar image matching performance at 1/10 of the bit-rate when compared to detecting key points independently frame-by-frame.
局部特征广泛应用于基于内容的图像检索和增强现实应用。通常,特征描述符是从图像中可重复关键点周围的典型补丁的梯度计算出来的。在之前的工作中,我们表明可以在接收端传输压缩的规范补丁并执行描述符计算,性能相当。在本文中,我们提出了一种时间相干关键点检测器,以实现规范补丁的有效帧间编码。在补丁间压缩中,人们通过简单地修改先前传输的补丁,努力用尽可能少的比特传输每个补丁。这使得基于服务器的移动增强现实能够以最小的比特率通过无线链路发送连续的显著信息流,足以用于基于图像的检索。实验结果表明,与逐帧独立检测关键点相比,我们的方法在1/10比特率下获得了相似的图像匹配性能。
{"title":"Interframe Coding of Canonical Patches for Mobile Augmented Reality","authors":"Mina Makar, Sam S. Tsai, V. Chandrasekhar, David M. Chen, B. Girod","doi":"10.1109/ISM.2012.18","DOIUrl":"https://doi.org/10.1109/ISM.2012.18","url":null,"abstract":"Local features are widely used for content-based image retrieval and augmented reality applications. Typically, feature descriptors are calculated from the gradients of a canonical patch around a repeatable key point in the image. In previous work, we showed that one can alternatively transmit the compressed canonical patch and perform descriptor computation at the receiving end with comparable performance. In this paper, we propose a temporally coherent key point detector in order to allow efficient interframe coding of canonical patches. In inter-patch compression, one strives to transmit each patch with as few bits as possible by simply modifying a previously transmitted patch. This enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for the image-based retrieval, can be sent over a wireless link at the smallest possible bit-rate. Experimental results show that our technique achieves a similar image matching performance at 1/10 of the bit-rate when compared to detecting key points independently frame-by-frame.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116949200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Multimodal Information Fusion of Audio Emotion Recognition Based on Kernel Entropy Component Analysis 基于核熵分量分析的音频情感识别多模态信息融合
Pub Date : 2012-12-10 DOI: 10.1142/S1793351X13400023
Zhibing Xie, L. Guan
This paper focuses on the application of novel information theoretic tools in the area of information fusion. Feature transformation and fusion is critical for the performance of information fusion, however the majority of the existing works depend on the second order statistics, which is only optimal for Gaussian-like distribution. In this paper, the integration of information fusion techniques and kernel entropy component analysis provides a new information theoretic tool. The fusion of features is realized using descriptor of information entropy and optimized by entropy estimation. A novel multimodal information fusion strategy of audio emotion recognition based on kernel entropy component analysis (KECA) has been presented. The effectiveness of the proposed solution is evaluated though experimentation on two audiovisual emotion databases. Experimental results show that the proposed solution outperforms the existing methods, especially when the dimension of feature space is substantially reduced. The proposed method offers general theoretical analysis which gives us an approach to implement information theory into multimedia research.
本文重点研究了新型信息理论工具在信息融合领域的应用。特征变换和融合是信息融合性能的关键,但现有的大部分工作依赖于二阶统计量,而二阶统计量仅对类高斯分布最优。本文将信息融合技术与核熵分量分析相结合,提供了一种新的信息理论工具。利用信息熵描述符实现特征融合,并通过熵估计进行优化。提出了一种基于核熵分量分析的音频情感识别多模态信息融合策略。通过在两个视听情感数据库上的实验,对该方法的有效性进行了评价。实验结果表明,该方法优于现有方法,特别是在特征空间维数大幅降低的情况下。该方法提供了一般性的理论分析,为我们将信息论应用于多媒体研究提供了一条途径。
{"title":"Multimodal Information Fusion of Audio Emotion Recognition Based on Kernel Entropy Component Analysis","authors":"Zhibing Xie, L. Guan","doi":"10.1142/S1793351X13400023","DOIUrl":"https://doi.org/10.1142/S1793351X13400023","url":null,"abstract":"This paper focuses on the application of novel information theoretic tools in the area of information fusion. Feature transformation and fusion is critical for the performance of information fusion, however the majority of the existing works depend on the second order statistics, which is only optimal for Gaussian-like distribution. In this paper, the integration of information fusion techniques and kernel entropy component analysis provides a new information theoretic tool. The fusion of features is realized using descriptor of information entropy and optimized by entropy estimation. A novel multimodal information fusion strategy of audio emotion recognition based on kernel entropy component analysis (KECA) has been presented. The effectiveness of the proposed solution is evaluated though experimentation on two audiovisual emotion databases. Experimental results show that the proposed solution outperforms the existing methods, especially when the dimension of feature space is substantially reduced. The proposed method offers general theoretical analysis which gives us an approach to implement information theory into multimedia research.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127816792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Logo Classification with Edge-Based DAISY Descriptor 基于边缘的DAISY描述符的标志分类
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.50
B. Lei, V. Thing, Yu Chen, Wee-Yong Lim
For the classification of logo images, there are significant challenges in the classification of merchandise logos such that only a few key points can be found in the relatively small logo images due to large variations in texture, poor illumination and generally, lack of discriminative features. This paper addresses these difficulties by introducing an integrated approach to classify merchandise logos with the combination of local edge-based descriptor-DAISY, spatial histogram and salient region detection. During the training phase, after carrying out the edge extraction, merchandise logos are described with a set of SIFT-like DAISY descriptors which is computed efficiently and densely along edge pixels. Visual word vocabulary generation and spatial histogram are used for describing the images/regions. Saliency map for object detection is adopted to narrow down and localize the logos. The feature map for approximating a non-linear kernel is also used to facilitate the classification by a linear SVM classifier. The experimental results demonstrate that the Edge-based DAISY (EDAISY) descriptor outperforms the state-of-the-art SIFT and DSIFT descriptors in terms of classification accuracy on a set of collected logo image dataset.
对于标志图像的分类,商品标志的分类面临着很大的挑战,在相对较小的标志图像中,由于纹理变化大,光照差,通常缺乏区分特征,只能找到几个关键点。本文通过引入一种结合局部边缘描述符daisy、空间直方图和显著区域检测的综合方法来解决这些困难。在训练阶段,在进行边缘提取之后,使用一组类似sift的DAISY描述符来描述商品徽标,该描述符沿着边缘像素高效且密集地计算。视觉词表生成和空间直方图用于描述图像/区域。采用目标检测的显著性映射来缩小和定位标识。近似非线性核的特征映射也被用来促进线性支持向量机分类器的分类。实验结果表明,基于边缘的DAISY (EDAISY)描述符在收集的一组徽标图像数据集上的分类精度优于最先进的SIFT和DSIFT描述符。
{"title":"Logo Classification with Edge-Based DAISY Descriptor","authors":"B. Lei, V. Thing, Yu Chen, Wee-Yong Lim","doi":"10.1109/ISM.2012.50","DOIUrl":"https://doi.org/10.1109/ISM.2012.50","url":null,"abstract":"For the classification of logo images, there are significant challenges in the classification of merchandise logos such that only a few key points can be found in the relatively small logo images due to large variations in texture, poor illumination and generally, lack of discriminative features. This paper addresses these difficulties by introducing an integrated approach to classify merchandise logos with the combination of local edge-based descriptor-DAISY, spatial histogram and salient region detection. During the training phase, after carrying out the edge extraction, merchandise logos are described with a set of SIFT-like DAISY descriptors which is computed efficiently and densely along edge pixels. Visual word vocabulary generation and spatial histogram are used for describing the images/regions. Saliency map for object detection is adopted to narrow down and localize the logos. The feature map for approximating a non-linear kernel is also used to facilitate the classification by a linear SVM classifier. The experimental results demonstrate that the Edge-based DAISY (EDAISY) descriptor outperforms the state-of-the-art SIFT and DSIFT descriptors in terms of classification accuracy on a set of collected logo image dataset.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134298823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FaceFetch: A User Emotion Driven Multimedia Content Recommendation System Based on Facial Expression Recognition FaceFetch:基于面部表情识别的用户情感驱动多媒体内容推荐系统
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.24
Mahesh Babu Mariappan, Myunghoon Suk, B. Prabhakaran
Recognition of facial expressions of users allows researchers to build context-aware applications that adapt according to the users' emotional states. Facial expression recognition is an active area of research in the computer vision community. In this paper, we present Face Fetch, a novel context-based multimedia content recommendation system that understands a user's current emotional state (happiness, sadness, fear, disgust, surprise and anger) through facial expression recognition and recommends multimedia content to the user. Our system can understand a user's emotional state through a desktop as well as a mobile user interface and pull multimedia content such as music, movies and other videos of interest to the user from the cloud with near real time performance.
对用户面部表情的识别使研究人员能够根据用户的情绪状态构建上下文感知应用程序。面部表情识别是计算机视觉领域的一个活跃研究领域。在本文中,我们提出了一种新的基于上下文的多媒体内容推荐系统Face Fetch,它通过面部表情识别来理解用户当前的情绪状态(快乐、悲伤、恐惧、厌恶、惊讶和愤怒),并向用户推荐多媒体内容。我们的系统可以通过桌面和移动用户界面了解用户的情绪状态,并以接近实时的性能从云端提取用户感兴趣的多媒体内容,如音乐、电影和其他视频。
{"title":"FaceFetch: A User Emotion Driven Multimedia Content Recommendation System Based on Facial Expression Recognition","authors":"Mahesh Babu Mariappan, Myunghoon Suk, B. Prabhakaran","doi":"10.1109/ISM.2012.24","DOIUrl":"https://doi.org/10.1109/ISM.2012.24","url":null,"abstract":"Recognition of facial expressions of users allows researchers to build context-aware applications that adapt according to the users' emotional states. Facial expression recognition is an active area of research in the computer vision community. In this paper, we present Face Fetch, a novel context-based multimedia content recommendation system that understands a user's current emotional state (happiness, sadness, fear, disgust, surprise and anger) through facial expression recognition and recommends multimedia content to the user. Our system can understand a user's emotional state through a desktop as well as a mobile user interface and pull multimedia content such as music, movies and other videos of interest to the user from the cloud with near real time performance.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128751767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Efficient Filtering of JPEG Images JPEG图像的有效过滤
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.88
David Edmundson, G. Schaefer
With image databases growing rapidly, efficient methods for content-based image retrieval (CBIR) are highly sought after. In this paper, we present a very fast method for filtering JPEG compressed images to discard irrelevant pictures. We show that compressing images using individually optimised quantisation tables not only maintains high image quality and therefore allows for improved compression rates, but that the quantisation tables themselves provide a useful image descriptor for CBIR. Visual similarity between images can thus be expressed as similarity between their quantisation tables. As these are stored in the JPEG header, feature extraction and similarity computation can be performed extremely fast, and we consequently employ our method as an initial filtering step for a subsequent CBIR algorithm. We show, on a benchmark dataset of more than 30,000 images, that we can filter 80% or more of the images without a drop in retrieval performance while reducing the online retrieval time by a factor of at about 5.
随着图像数据库的快速发展,高效的基于内容的图像检索(CBIR)方法受到了广泛的关注。在本文中,我们提出了一种非常快速的方法来过滤JPEG压缩图像,以丢弃不相关的图像。我们表明,使用单独优化的量化表压缩图像不仅可以保持高图像质量,因此可以提高压缩率,而且量化表本身为CBIR提供了有用的图像描述符。因此,图像之间的视觉相似性可以表示为它们的量化表之间的相似性。由于这些都存储在JPEG标头中,特征提取和相似度计算可以非常快地执行,因此我们使用我们的方法作为后续CBIR算法的初始过滤步骤。我们在超过30,000张图像的基准数据集上显示,我们可以过滤80%或更多的图像而不会降低检索性能,同时将在线检索时间减少约5倍。
{"title":"Efficient Filtering of JPEG Images","authors":"David Edmundson, G. Schaefer","doi":"10.1109/ISM.2012.88","DOIUrl":"https://doi.org/10.1109/ISM.2012.88","url":null,"abstract":"With image databases growing rapidly, efficient methods for content-based image retrieval (CBIR) are highly sought after. In this paper, we present a very fast method for filtering JPEG compressed images to discard irrelevant pictures. We show that compressing images using individually optimised quantisation tables not only maintains high image quality and therefore allows for improved compression rates, but that the quantisation tables themselves provide a useful image descriptor for CBIR. Visual similarity between images can thus be expressed as similarity between their quantisation tables. As these are stored in the JPEG header, feature extraction and similarity computation can be performed extremely fast, and we consequently employ our method as an initial filtering step for a subsequent CBIR algorithm. We show, on a benchmark dataset of more than 30,000 images, that we can filter 80% or more of the images without a drop in retrieval performance while reducing the online retrieval time by a factor of at about 5.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129586366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GPU-Enabled High Performance Online Visual Search with High Accuracy 启用gpu的高性能在线视觉搜索与高精度
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.85
Ali Cevahir, Junji Torii
We propose an online image search engine based on local image features (key points), which runs fully on GPUs. State-of-the-art visual image retrieval techniques are based on bag-of-visual-words (BoV) model, which is an analogy for text-based search. In BoV, each key point is rounded off to the nearest visual word. On the other hand in this work, thanks to the vector computation power of GPUs, we utilize real values of key point descriptors. We match key points in two steps. The idea in the first step is similar to visual word matching in BoV. In the second step, we do matching in key point level. By keeping identities of each key point, closest key points are accurately retrieved in real-time. Image search has different characteristics than textual search. We implement one-to-one key point matching, which is more natural for images. Our experiments reveal 265 times speed up for offline index generation, 104 times speedup for online index search and 20.5 times speedup for online key point matching time, when compared to the CPU implementation. Our proposed key point-matching-based search improves accuracy of BoV by 9.5%.
我们提出了一种基于局部图像特征(关键点)的在线图像搜索引擎,该引擎完全运行在gpu上。目前最先进的视觉图像检索技术是基于视觉词袋(BoV)模型,该模型与基于文本的搜索类似。在BoV中,每个关键点被四舍五入到最近的视觉单词。另一方面,在这项工作中,由于gpu的矢量计算能力,我们利用了关键点描述符的实值。我们分两步匹配关键点。第一步的想法类似于BoV中的视觉单词匹配。第二步,在关键点层面进行匹配。通过保留每个关键点的身份,可以实时准确地检索到最近的关键点。图像搜索与文本搜索具有不同的特点。我们实现了一对一的关键点匹配,这对图像来说更自然。我们的实验表明,与CPU实现相比,离线索引生成的速度提高了265倍,在线索引搜索的速度提高了104倍,在线关键点匹配时间的速度提高了20.5倍。我们提出的基于关键点匹配的搜索将BoV的准确率提高了9.5%。
{"title":"GPU-Enabled High Performance Online Visual Search with High Accuracy","authors":"Ali Cevahir, Junji Torii","doi":"10.1109/ISM.2012.85","DOIUrl":"https://doi.org/10.1109/ISM.2012.85","url":null,"abstract":"We propose an online image search engine based on local image features (key points), which runs fully on GPUs. State-of-the-art visual image retrieval techniques are based on bag-of-visual-words (BoV) model, which is an analogy for text-based search. In BoV, each key point is rounded off to the nearest visual word. On the other hand in this work, thanks to the vector computation power of GPUs, we utilize real values of key point descriptors. We match key points in two steps. The idea in the first step is similar to visual word matching in BoV. In the second step, we do matching in key point level. By keeping identities of each key point, closest key points are accurately retrieved in real-time. Image search has different characteristics than textual search. We implement one-to-one key point matching, which is more natural for images. Our experiments reveal 265 times speed up for offline index generation, 104 times speedup for online index search and 20.5 times speedup for online key point matching time, when compared to the CPU implementation. Our proposed key point-matching-based search improves accuracy of BoV by 9.5%.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117325299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Cloud-Based Collaborative and Automatic Video Editor 基于云的协作和自动视频编辑器
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.78
A. Outtagarts, Abderrazagh Mbodj
Automatic video editing is a hot topic due to the rapid growth of video usages. In this paper, we present a cloud-based tool and an approach of automatic video editing based on keywords extracted from the audio transcription. Using texts transcript of the audio, video sequences are selected and chained to create automatically a new video with a time duration fixed by the user. A cloud based video editor allows users to collaboratively edit the video.
随着视频使用量的快速增长,视频自动编辑成为一个热门话题。在本文中,我们提出了一种基于云的工具和一种基于从音频转录中提取关键词的自动视频编辑方法。使用文本文本的音频,视频序列被选择和链接,以自动创建一个新的视频与固定的时间持续时间由用户。基于云的视频编辑器允许用户协同编辑视频。
{"title":"A Cloud-Based Collaborative and Automatic Video Editor","authors":"A. Outtagarts, Abderrazagh Mbodj","doi":"10.1109/ISM.2012.78","DOIUrl":"https://doi.org/10.1109/ISM.2012.78","url":null,"abstract":"Automatic video editing is a hot topic due to the rapid growth of video usages. In this paper, we present a cloud-based tool and an approach of automatic video editing based on keywords extracted from the audio transcription. Using texts transcript of the audio, video sequences are selected and chained to create automatically a new video with a time duration fixed by the user. A cloud based video editor allows users to collaboratively edit the video.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123818854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards Automatic Stereoscopic Video Synthesis from a Casual Monocular Video 从偶然的单目视频实现自动立体视频合成
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.64
Lin Zhong, Sen Wang, Minwoo Park, Rodney L. Miller, Dimitris N. Metaxas
Automatically synthesizing 3D content from a causal monocular video has become an important problem. Previous works either use no geometry information, or rely on precise 3D geometry information. Therefore, they cannot obtain reasonable results if the 3D structure in the scene is complex, or noisy 3D geometry information is estimated from monocular videos. In this paper, we present an automatic and robust framework to synthesize stereoscopic videos from casual 2D monocular videos. First, 3D geometry information (e.g., camera parameters, depth map) are extracted from the 2D input video. Then a Bayesian-based View Synthesis (BVS) approach is proposed to render high-quality new virtual views for stereoscopic video to deal with noisy 3D geometry information. Extensive experiments on various videos demonstrate that BVS can synthesize more accurate views than other methods, and our proposed framework also be able to generate high-quality 3D videos.
从任意的单目视频中自动合成3D内容已经成为一个重要的问题。以前的作品要么不使用几何信息,要么依赖于精确的三维几何信息。因此,如果场景中的三维结构比较复杂,或者从单目视频中估计有噪声的三维几何信息,则无法得到合理的结果。在本文中,我们提出了一个自动的、鲁棒的框架来从随意的2D单目视频合成立体视频。首先,从二维输入视频中提取三维几何信息(如摄像机参数、深度图)。然后提出了一种基于贝叶斯的视图合成(BVS)方法,为立体视频呈现高质量的新虚拟视图,以处理有噪声的三维几何信息。在各种视频上的大量实验表明,BVS可以比其他方法合成更精确的视图,并且我们提出的框架也可以生成高质量的3D视频。
{"title":"Towards Automatic Stereoscopic Video Synthesis from a Casual Monocular Video","authors":"Lin Zhong, Sen Wang, Minwoo Park, Rodney L. Miller, Dimitris N. Metaxas","doi":"10.1109/ISM.2012.64","DOIUrl":"https://doi.org/10.1109/ISM.2012.64","url":null,"abstract":"Automatically synthesizing 3D content from a causal monocular video has become an important problem. Previous works either use no geometry information, or rely on precise 3D geometry information. Therefore, they cannot obtain reasonable results if the 3D structure in the scene is complex, or noisy 3D geometry information is estimated from monocular videos. In this paper, we present an automatic and robust framework to synthesize stereoscopic videos from casual 2D monocular videos. First, 3D geometry information (e.g., camera parameters, depth map) are extracted from the 2D input video. Then a Bayesian-based View Synthesis (BVS) approach is proposed to render high-quality new virtual views for stereoscopic video to deal with noisy 3D geometry information. Extensive experiments on various videos demonstrate that BVS can synthesize more accurate views than other methods, and our proposed framework also be able to generate high-quality 3D videos.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"45 18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130782676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing the MST-CSS Representation Using Robust Geometric Features, for Efficient Content Based Video Retrieval (CBVR) 利用鲁棒几何特征增强MST-CSS表示,实现基于内容的高效视频检索(CBVR)
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.71
C. Chattopadhyay, Sukhendu Das
Multi-Spectro-Temporal Curvature Scale Space (MST-CSS) had been proposed as a video content descriptor in an earlier work, where the peak and saddle points were used for feature points. But these are inadequate to capture the salient features of the MST-CSS surface, producing poor retrieval results. To overcome these, we propose EMST-CSS (Enhanced MST-CSS) as a better feature representation with an improved matching method for CBVR (Content Based Video Retrieval). Comparative study with the existing MST-CSS representation and two state-of-the-art methods for CBVR shows enhanced performance on one synthetic and two real-world datasets.
多光谱-时间曲率尺度空间(MST-CSS)在较早的研究中被提出作为视频内容描述符,其中峰点和鞍点作为特征点。但这些都不足以捕捉到MST-CSS表面的显著特征,导致检索结果不佳。为了克服这些问题,我们提出了EMST-CSS (Enhanced MST-CSS)作为CBVR(基于内容的视频检索)的更好的特征表示和改进的匹配方法。与现有的MST-CSS表示和两种最先进的CBVR方法进行比较研究表明,在一个合成数据集和两个真实数据集上,CBVR的性能有所提高。
{"title":"Enhancing the MST-CSS Representation Using Robust Geometric Features, for Efficient Content Based Video Retrieval (CBVR)","authors":"C. Chattopadhyay, Sukhendu Das","doi":"10.1109/ISM.2012.71","DOIUrl":"https://doi.org/10.1109/ISM.2012.71","url":null,"abstract":"Multi-Spectro-Temporal Curvature Scale Space (MST-CSS) had been proposed as a video content descriptor in an earlier work, where the peak and saddle points were used for feature points. But these are inadequate to capture the salient features of the MST-CSS surface, producing poor retrieval results. To overcome these, we propose EMST-CSS (Enhanced MST-CSS) as a better feature representation with an improved matching method for CBVR (Content Based Video Retrieval). Comparative study with the existing MST-CSS representation and two state-of-the-art methods for CBVR shows enhanced performance on one synthetic and two real-world datasets.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114312070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2012 IEEE International Symposium on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1