2015 IEEE International Symposium on Multimedia (ISM)最新文献

英文中文

Personalized Indexing of Attention in Lectures -- Requirements and Concept 讲座中注意力的个性化索引——要求与概念

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.44

Sebastian Pospiech, N. Birnbaum, L. Knipping, R. Mertens

Web lectures can be employed in a variety of didactic scenarios ranging from add-on for a live lecture to stand-alone learning content. In all of these scenarios, though less in the stand-alone one, indexing and navigation are crucial for real world usability. As a consequence, many approaches like slide based indexing, transcript based indexing, collaborative manual indexing as well as individual or social indexing based on viewing behavior have been devised. The approach proposed in this paper takes individual indexing based on viewing behavior two steps further in that (a) indexes the recording at production time in the lecture hall and (b) actively analyzes the students attention focus instead of passively recording viewing time as done in conventional footprinting. In order to track student attention during the lecture, recoding and analyzing the student's behaviour in parallel to the lecture as well as synchronizing both data streams is necessary. This paper discusses the architecture required for personalized attention based indexing, possible problems and strategies to tackle them.

网络讲座可以用于各种教学场景，从现场讲座的附加内容到独立的学习内容。在所有这些场景中，索引和导航对于现实世界的可用性至关重要，尽管在独立场景中较少。因此，许多方法，如基于幻灯片的索引，基于文本的索引，协作手动索引以及基于观看行为的个人或社会索引已经被设计出来。本文提出的方法将基于观看行为的个体索引向前推进了两步，即(a)在演讲厅的录制时间索引，(b)主动分析学生的注意力焦点，而不是像传统的足迹那样被动记录观看时间。为了在讲座期间跟踪学生的注意力，在讲课的同时重新编码和分析学生的行为以及同步两个数据流是必要的。本文讨论了个性化关注索引所需的体系结构、可能存在的问题以及解决这些问题的策略。

引用次数: 0

Employing Sensors and Services Fusion to Detect and Assess Driving Events 利用传感器和服务融合检测和评估驾驶事件

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.121

Seyed Vahid Hosseinioun, Hussein Al Osman, Abdulmotaleb El Saddik

With the remarkable increase in use of sensors in our daily lives, various methods have been devised to detect events in a driving environment using smart-phones as they provide two main advantages: they eliminate the need to have dedicated hardware in vehicles and they are widely accessible. Since rewarding safe driving is an important issue for insurance companies, some companies are implementing Usage-Based Insurance (UBI) as opposed to traditional History-Based plans. The collection of driving events, such as acceleration and turning, is a prerequisite requirement for the adoption of such plans. Mobile phone sensors are capable of detecting whether a car is accelerating or braking, while through service fusion we can detect other events like speeding or instances of severe weather. We propose a new and robust hybrid classification algorithm that detects acceleration-based events with an F1-score of 0.9304 and turn events with an F1-score of 0.9038. We further propose a method for measuring the driving performance index using the detected events.

随着传感器在我们日常生活中的使用显著增加，人们设计了各种方法来使用智能手机检测驾驶环境中的事件，因为它们有两个主要优势:它们消除了对车辆专用硬件的需求，而且它们很容易获得。由于奖励安全驾驶是保险公司的重要问题，一些公司正在实施基于使用情况的保险(UBI)，而不是传统的基于历史的计划。收集驾驶事件，如加速和转弯，是采用这种计划的先决条件。手机传感器能够检测汽车是否在加速或刹车，而通过服务融合，我们可以检测超速或恶劣天气等其他事件。我们提出了一种新的鲁棒混合分类算法，该算法可以检测f1得分为0.9304的基于加速的事件和f1得分为0.9038的转弯事件。我们进一步提出了一种利用检测到的事件来测量驾驶性能指标的方法。

引用次数: 13

Exploring the Complementarity of Audio-Visual Structural Regularities for the Classification of Videos into TV-Program Collections 视像分类为电视节目集的视听结构规律互补性探讨

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.133

G. Sargent, P. Hanna, H. Nicolas, F. Bimbot

This article proposes to analyze the structural regularities from the audio and video streams of TV-programs and explore their potential for the classification of videos into program collections. Our approach is based on the spectral analysis of distance matrices representing the short-and long-term dependancies within the audio and visual modalities of a video. We propose to compare two videos by their respective spectral features. We appreciate the benefits brought by the two modalities on the performances in the context of a K-nearest neighbor classification, and we test our approach in the context of an unsupervised clustering algorithm. These evaluations are performed on two datasets of French and Italian TV programs.

本文提出从电视节目的音视频流中分析其结构规律，探讨其在将视频分类为节目集方面的潜力。我们的方法是基于距离矩阵的频谱分析，表示视频的音频和视觉模式中的短期和长期依赖关系。我们建议通过各自的光谱特征来比较两个视频。我们意识到这两种模式在k近邻分类环境下对性能带来的好处，并且我们在无监督聚类算法的环境中测试了我们的方法。这些评估是在法语和意大利语电视节目的两个数据集上进行的。

引用次数: 3

A Novel Two Pass Rate Control Scheme for Variable Bit Rate Video Streaming 一种新的可变比特率视频流的双通率控制方案

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.32

M. VenkataPhaniKumar, K. C. R. C. Varma, S. Mahapatra

In this paper, a novel two-pass rate control scheme is proposed to achieve a consistent visual quality media for variable bit rate (VBR) video streaming. The rate-distortion (RD) characteristics of each frame is used to establish a frame complexity model, which is later used along with statistics collected in the first-pass to derive an optimal quantization parameter for encoding the frame in the second-pass. The experimental results demonstrate that the proposed rate control scheme significantly outperforms the existing rate control mechanism in the Joint Model (JM) reference software in terms of the Peak Signal to Noise Ratio (PSNR) and consistent perceptual visual quality while achieving the target bit rate. Further, the proposed scheme is validated through implementation on a miniature test-bed.

本文提出了一种新的双通率控制方案，以实现可变比特率视频流的一致视觉质量。利用每一帧的率失真(RD)特征建立帧复杂度模型，然后将该模型与第一帧收集的统计数据结合使用，得出第二帧编码的最佳量化参数。实验结果表明，在达到目标比特率的同时，所提出的速率控制方案在峰值信噪比(PSNR)和一致的感知视觉质量方面明显优于Joint Model (JM)参考软件中现有的速率控制机制。最后，在一个小型试验台上对该方案进行了验证。

引用次数: 2

Improvement of Image and Video Matting with Multiple Reliability Maps 基于多可靠性映射的图像和视频抠图改进

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.28

Takahiro Hayashi, Masato Ishimori, N. Ishii, K. Abe

In this paper, we propose a framework for extending existing matting methods to actualize more reliable alpha estimation. The key idea of the framework is integration of multiple alpha maps based on their reliabilities. In the proposed framework, the given input image is converted into multiple grayscale images having various luminance appearances. Then, alpha maps are generated corresponding to these grayscale images by utilizing an existing matting method. At the same time reliability maps (single channel images visualizing the reliabilities of the estimated alpha values) are generated. Finally, by combining alpha maps having the highest reliabilities in each local region, one reliable alpha map is generated. The experimental results have shown that reliable alpha estimation can be actualized by the proposed framework.

在本文中，我们提出了一个框架来扩展现有的抠图方法，以实现更可靠的alpha估计。该框架的关键思想是基于可靠性对多个alpha映射进行集成。在所提出的框架中，将给定的输入图像转换为具有不同亮度外观的多个灰度图像。然后，利用现有的抠图方法对这些灰度图像生成对应的alpha贴图。同时生成可靠性图(显示估计alpha值的可靠性的单通道图像)。最后，通过组合每个局部区域具有最高可靠性的alpha地图，生成一个可靠的alpha地图。实验结果表明，该框架可以实现可靠的alpha估计。

引用次数: 0

Joint Video and Sparse 3D Transform-Domain Collaborative Filtering for Time-of-Flight Depth Maps 联合视频和稀疏三维变换域协同滤波的飞行时间深度图

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.112

T. Hach, Tamara Seybold, H. Böttcher

This paper proposes a novel strategy for depth video denoising in RGBD camera systems. Today's depth map sequences obtained by state-of-the-art Time-of-Flight sensors suffer from high temporal noise. All high-level RGB video renderings based on the accompanied depth map's 3D geometry like augmented reality applications will have severe temporal flickering artifacts. We approached this limitation by decoupling depth map upscaling from the temporal denoising step. Thereby, denoising is processed on raw pixels including uncorrelated pixel-wise noise distributions. Our denoising methodology utilizes joint sparse 3D transform-domain collaborative filtering. Therein, we extract RGB texture information to yield a more stable and accurate highly sparse 3D depth block representation for the consecutive shrinkage operation. We show the effectiveness of our method on real RGBD camera data and on a publicly available synthetic data set. The evaluation reveals that our method is superior to state-of-the-art methods. Our method delivers improved flicker-free depth video streams for future applications, which are especially sensitive to temporal noise and arbitrary depth artifacts.

提出了一种新的RGBD相机系统深度视频去噪策略。当今由最先进的飞行时间传感器获得的深度图序列受到高时间噪声的影响。所有基于随附深度图的3D几何图形(如增强现实应用程序)的高级RGB视频渲染都会有严重的时间闪烁伪影。我们通过将深度图上尺度与时间去噪步骤解耦来解决这一限制。因此，在包括不相关的逐像素噪声分布的原始像素上处理去噪。我们的去噪方法采用联合稀疏三维变换域协同滤波。其中，我们提取RGB纹理信息，为连续收缩操作提供更稳定和准确的高度稀疏的3D深度块表示。我们在真实的RGBD相机数据和公开的合成数据集上展示了我们的方法的有效性。评价表明我们的方法优于最先进的方法。我们的方法为未来的应用提供了改进的无闪烁深度视频流，这些视频流对时间噪声和任意深度伪影特别敏感。

{"title":"Joint Video and Sparse 3D Transform-Domain Collaborative Filtering for Time-of-Flight Depth Maps","authors":"T. Hach, Tamara Seybold, H. Böttcher","doi":"10.1109/ISM.2015.112","DOIUrl":"https://doi.org/10.1109/ISM.2015.112","url":null,"abstract":"This paper proposes a novel strategy for depth video denoising in RGBD camera systems. Today's depth map sequences obtained by state-of-the-art Time-of-Flight sensors suffer from high temporal noise. All high-level RGB video renderings based on the accompanied depth map's 3D geometry like augmented reality applications will have severe temporal flickering artifacts. We approached this limitation by decoupling depth map upscaling from the temporal denoising step. Thereby, denoising is processed on raw pixels including uncorrelated pixel-wise noise distributions. Our denoising methodology utilizes joint sparse 3D transform-domain collaborative filtering. Therein, we extract RGB texture information to yield a more stable and accurate highly sparse 3D depth block representation for the consecutive shrinkage operation. We show the effectiveness of our method on real RGBD camera data and on a publicly available synthetic data set. The evaluation reveals that our method is superior to state-of-the-art methods. Our method delivers improved flicker-free depth video streams for future applications, which are especially sensitive to temporal noise and arbitrary depth artifacts.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122908200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficient Multi-training Framework of Image Deep Learning on GPU Cluster 基于GPU集群的图像深度学习高效多训练框架

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.119

Chun-Fu Chen, G. Lee, Yinglong Xia, Wan-Yi Sabrina Lin, T. Suzumura, Ching-Yung Lin

In this paper, we develop a pipelining schema for image deep learning on GPU cluster to leverage heavy workload of training procedure. In addition, it is usually necessary to train multiple models to obtain a good deep learning model due to the limited a priori knowledge on deep neural network structure. Therefore, adopting parallel and distributed computing appears is an obvious path forward, but the mileage varies depending on how amenable a deep network can be parallelized and the availability of rapid prototyping capabilities with low cost of entry. In this work, we propose a framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset. Instead of frequently migrating data among the disks, CPUs, and GPUs, our framework only moves partially trained models to reduce bandwidth consumption and to leverage the full computation capability of the cluster. In this paper, we deploy the proposed framework on popular image recognition tasks using deep learning, and the experiments show that the proposed method reduces overall training time up to dozens of hours compared to the baseline method.

在本文中，我们开发了一种基于GPU集群的图像深度学习的流水线模式，以利用繁重的训练过程。此外，由于深度神经网络结构的先验知识有限，通常需要训练多个模型才能获得良好的深度学习模型。因此，采用并行和分布式计算似乎是一条显而易见的前进道路，但具体进展取决于深度网络的并行化程度，以及低入门成本的快速原型功能的可用性。在这项工作中，我们提出了一个框架，将多个深度学习模型的训练过程组织到GPU集群上的管道中，其中每个阶段由具有训练数据集分区的特定GPU处理。我们的框架没有在磁盘、cpu和gpu之间频繁地迁移数据，而是只移动部分训练好的模型，以减少带宽消耗并利用集群的全部计算能力。在本文中，我们使用深度学习将所提出的框架部署在流行的图像识别任务上，实验表明，与基线方法相比，所提出的方法可以减少总体训练时间长达数十小时。

{"title":"Efficient Multi-training Framework of Image Deep Learning on GPU Cluster","authors":"Chun-Fu Chen, G. Lee, Yinglong Xia, Wan-Yi Sabrina Lin, T. Suzumura, Ching-Yung Lin","doi":"10.1109/ISM.2015.119","DOIUrl":"https://doi.org/10.1109/ISM.2015.119","url":null,"abstract":"In this paper, we develop a pipelining schema for image deep learning on GPU cluster to leverage heavy workload of training procedure. In addition, it is usually necessary to train multiple models to obtain a good deep learning model due to the limited a priori knowledge on deep neural network structure. Therefore, adopting parallel and distributed computing appears is an obvious path forward, but the mileage varies depending on how amenable a deep network can be parallelized and the availability of rapid prototyping capabilities with low cost of entry. In this work, we propose a framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset. Instead of frequently migrating data among the disks, CPUs, and GPUs, our framework only moves partially trained models to reduce bandwidth consumption and to leverage the full computation capability of the cluster. In this paper, we deploy the proposed framework on popular image recognition tasks using deep learning, and the experiments show that the proposed method reduces overall training time up to dozens of hours compared to the baseline method.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125674733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Interactive Crowd Content Generation and Analysis Using Trajectory-Level Behavior Learning 使用轨迹级行为学习的交互式人群内容生成和分析

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.89

Sujeong Kim, Aniket Bera, Dinesh Manocha

We present an interactive approach for analyzing crowd videos and generating content for multimedia applications. Our formulation combines online tracking algorithms from computer vision, non-linear pedestrian motion models from computer graphics, and machine learning techniques to automatically compute the trajectory-level pedestrian behaviors for each agent in the video. These learned behaviors are used to detect anomalous behaviors, perform crowd replication, augment crowd videos with virtual agents, and segment the motion of pedestrians. We demonstrate the performance of these tasks using indoor and outdoor crowd video benchmarks consisting of tens of human agents, moreover, our algorithm takes less than a tenth of a second per frame on a multi-core PC. The overall approach can handle dense and heterogeneous crowd behaviors and is useful for realtime crowd scene analysis applications.

我们提出了一种交互式方法来分析人群视频并为多媒体应用程序生成内容。我们的公式结合了计算机视觉的在线跟踪算法、计算机图形学的非线性行人运动模型和机器学习技术，自动计算视频中每个代理的轨迹级行人行为。这些学习行为用于检测异常行为，执行人群复制，用虚拟代理增强人群视频，并分割行人的运动。我们使用由数十个人类代理组成的室内和室外人群视频基准来演示这些任务的性能，此外，我们的算法在多核PC上每帧耗时不到十分之一秒。总体而言，该方法可以处理密集和异构的人群行为，对于实时人群场景分析应用非常有用。

引用次数: 15

Distortion Estimation Using Structural Similarity for Video Transmission over Wireless Networks 基于结构相似度的无线视频传输失真估计

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.88

Arun Sankisa, A. Katsaggelos, P. Pahalawatta

Efficient streaming of video over wireless networks requires real-time assessment of distortion due to packet loss, especially because predictive coding at the encoder can cause inter-frame propagation of errors and impact the overall quality of the transmitted video. This paper presents an algorithm to evaluate the expected receiver distortion on the source side by utilizing encoder information, transmission channel characteristics and error concealment. Specifically, distinct video transmission units, Group of Blocks (GOBs), are iteratively built at the source by taking into account macroblock coding modes and motion-compensated error concealment for three different combinations of packet loss. Distortion of these units is then calculated using the structural similarity (SSIM) metric and they are stochastically combined to derive the overall expected distortion. The proposed model provides a more accurate estimate of the distortion that closely models quality as perceived through the human visual system. When incorporated into a content-aware utility function, preliminary experimental results show improved packet ordering & scheduling efficiency and overall video signal at the receiver.

无线网络上高效的视频流需要实时评估由于丢包造成的失真，特别是因为编码器的预测编码会导致帧间传播错误并影响传输视频的整体质量。本文提出了一种利用编码器信息、传输信道特性和错误隐藏来评估信源侧预期接收失真的算法。具体来说，不同的视频传输单元，组块(gob)，通过考虑宏块编码模式和三种不同的包丢失组合的运动补偿错误隐藏，在源处迭代构建。然后使用结构相似性(SSIM)度量来计算这些单元的失真，并将它们随机组合以得出总体预期失真。提出的模型提供了一个更准确的失真估计，接近模型质量，通过人类视觉系统感知。当与内容感知实用函数结合时，初步实验结果表明，在接收端，数据包排序和调度效率以及整体视频信号都有所提高。

{"title":"Distortion Estimation Using Structural Similarity for Video Transmission over Wireless Networks","authors":"Arun Sankisa, A. Katsaggelos, P. Pahalawatta","doi":"10.1109/ISM.2015.88","DOIUrl":"https://doi.org/10.1109/ISM.2015.88","url":null,"abstract":"Efficient streaming of video over wireless networks requires real-time assessment of distortion due to packet loss, especially because predictive coding at the encoder can cause inter-frame propagation of errors and impact the overall quality of the transmitted video. This paper presents an algorithm to evaluate the expected receiver distortion on the source side by utilizing encoder information, transmission channel characteristics and error concealment. Specifically, distinct video transmission units, Group of Blocks (GOBs), are iteratively built at the source by taking into account macroblock coding modes and motion-compensated error concealment for three different combinations of packet loss. Distortion of these units is then calculated using the structural similarity (SSIM) metric and they are stochastically combined to derive the overall expected distortion. The proposed model provides a more accurate estimate of the distortion that closely models quality as perceived through the human visual system. When incorporated into a content-aware utility function, preliminary experimental results show improved packet ordering & scheduling efficiency and overall video signal at the receiver.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116018657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Video Content Summarization Using Geospatial Mosaics of Aerial Imagery 基于航空图像地理空间马赛克的自动视频内容摘要

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.124

R. Viguier, Chung-Ching Lin, H. Aliakbarpour, F. Bunyak, Sharath Pankanti, G. Seetharaman, K. Palaniappan

It is estimated that less than five percent of videos are currently analyzed to any degree. In addition to petabyte-sized multimedia archives, continuing innovations in optics, imaging sensors, camera arrays, (aerial) platforms, and storage technologies indicates that for the foreseeable future existing and new applications will continue to generate enormous volumes of video imagery. Contextual video summarizations and activity maps offers one innovative direction to tackling this Big Data problem in computer vision. The goal of this work is to develop semi-automatic exploitation algorithms and tools to increase utility, dissemination and usage potential by providing quick dynamic overview geospatial mosaics and motion maps. We present a framework to summarize (multiple) video streams from unmanned aerial vehicles (UAV) or drones which have very different characteristics compared to structured commercial and consumer videos that have been analyzed in the past. Using both metadata geospatial characteristics of the video combined with fast low-level image-based algorithms, the proposed method first generates mini-mosaics that can then be combined into geo-referenced meta-mosaics imagery. These geospatial maps enable rapid assessment of hours long videos with arbitrary spatial coverage from multiple sensors by generating quick look imagery, composed of multiple mini-mosaics, summarizing spatiotemporal dynamics such as coverage, dwell time, activity, etc. The overall summarization pipeline was tested on several DARPA Video and Image Retrieval and Analysis Tool (VIRAT) datasets. We evaluate the effectiveness of the proposed video summarization framework using metrics such as compression and hours of viewing time.

据估计，目前只有不到5%的视频得到了某种程度的分析。除了拍字节大小的多媒体档案之外，光学、成像传感器、相机阵列、(空中)平台和存储技术方面的持续创新表明，在可预见的未来，现有的和新的应用将继续产生大量的视频图像。上下文视频摘要和活动地图为解决计算机视觉中的大数据问题提供了一个创新方向。这项工作的目标是开发半自动开发算法和工具，通过提供快速动态概览地理空间马赛克和运动地图来增加效用、传播和使用潜力。我们提出了一个框架来总结来自无人机(UAV)或无人机的(多个)视频流，这些视频流与过去分析的结构化商业和消费者视频相比具有非常不同的特征。该方法将视频的元数据地理空间特征与快速的低水平图像算法相结合，首先生成迷你马赛克，然后将其组合成地理参考元马赛克图像。这些地理空间地图通过生成由多个迷你马赛克组成的快速查看图像，总结了覆盖范围、停留时间、活动等时空动态，从而能够快速评估来自多个传感器的任意空间覆盖的数小时视频。在几个DARPA视频和图像检索和分析工具(VIRAT)数据集上对总体摘要管道进行了测试。我们使用压缩和观看时间等指标来评估所提出的视频摘要框架的有效性。

{"title":"Automatic Video Content Summarization Using Geospatial Mosaics of Aerial Imagery","authors":"R. Viguier, Chung-Ching Lin, H. Aliakbarpour, F. Bunyak, Sharath Pankanti, G. Seetharaman, K. Palaniappan","doi":"10.1109/ISM.2015.124","DOIUrl":"https://doi.org/10.1109/ISM.2015.124","url":null,"abstract":"It is estimated that less than five percent of videos are currently analyzed to any degree. In addition to petabyte-sized multimedia archives, continuing innovations in optics, imaging sensors, camera arrays, (aerial) platforms, and storage technologies indicates that for the foreseeable future existing and new applications will continue to generate enormous volumes of video imagery. Contextual video summarizations and activity maps offers one innovative direction to tackling this Big Data problem in computer vision. The goal of this work is to develop semi-automatic exploitation algorithms and tools to increase utility, dissemination and usage potential by providing quick dynamic overview geospatial mosaics and motion maps. We present a framework to summarize (multiple) video streams from unmanned aerial vehicles (UAV) or drones which have very different characteristics compared to structured commercial and consumer videos that have been analyzed in the past. Using both metadata geospatial characteristics of the video combined with fast low-level image-based algorithms, the proposed method first generates mini-mosaics that can then be combined into geo-referenced meta-mosaics imagery. These geospatial maps enable rapid assessment of hours long videos with arbitrary spatial coverage from multiple sensors by generating quick look imagery, composed of multiple mini-mosaics, summarizing spatiotemporal dynamics such as coverage, dwell time, activity, etc. The overall summarization pipeline was tested on several DARPA Video and Image Retrieval and Analysis Tool (VIRAT) datasets. We evaluate the effectiveness of the proposed video summarization framework using metrics such as compression and hours of viewing time.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"44 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121012492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE International Symposium on Multimedia (ISM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀