首页 > 最新文献

2011 IEEE International Symposium on Multimedia最新文献

英文 中文
A Feasibility Study of Collaborative Stream Routing in Peer-to-Peer Multiparty Video Conferencing 协同流路由在点对点多方视频会议中的可行性研究
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.45
Han Zhao, D. Smilkov, P. Dettori, J. Nogima, F. Schaffa, P. Westerink, C. Wu
Video transmission in multiparty video conferencing is challenging due to the demanding bandwidth usage and stringent latency requirement. In this paper, we systematically analyze the problem of collaborative stream routing using one-hop forwarding assistance in a bandwidth constraint environment. We model the problem as a multi-source degree-constrained multicast tree construction problem, and investigate heuristic algorithms to construct bandwidth-feasible shared multicast trees. The contribution of this work is primarily two-fold: (1) we study the solution space of finding a feasible bandwidth configuration for stream routing in a peer-to-peer (P2P) setting, and propose two heuristic algorithms that can quickly produce a bandwidth-feasible solution, making them suitable for large-scale conference sessions, (2) we conduct an empirical study using a realistic dataset and show the effectiveness of our heuristic algorithms. Various QoS metrics are taken into account to evaluate the performance of our algorithms. Finally, we discuss open issues for further exploration. The feasibility study presented in this paper will shed light on the design and implementation of practical P2P multiparty video conferencing applications.
多人视频会议的视频传输对带宽和时延的要求很高。本文系统地分析了在带宽受限环境下使用一跳转发辅助的协同流路由问题。我们将该问题建模为一个多源程度约束的组播树构造问题,并研究了启发式算法来构造带宽可行的共享组播树。这项工作的贡献主要有两个方面:(1)我们研究了在点对点(P2P)设置中寻找可行的流路由带宽配置的解空间,并提出了两种可以快速产生带宽可行解的启发式算法,使其适用于大型会议;(2)我们使用现实数据集进行了实证研究,并展示了我们的启发式算法的有效性。考虑到各种QoS指标来评估我们的算法的性能。最后,我们讨论了有待进一步探讨的开放性问题。本文提出的可行性研究将为实际的P2P多方视频会议应用的设计和实现提供启示。
{"title":"A Feasibility Study of Collaborative Stream Routing in Peer-to-Peer Multiparty Video Conferencing","authors":"Han Zhao, D. Smilkov, P. Dettori, J. Nogima, F. Schaffa, P. Westerink, C. Wu","doi":"10.1109/ISM.2011.45","DOIUrl":"https://doi.org/10.1109/ISM.2011.45","url":null,"abstract":"Video transmission in multiparty video conferencing is challenging due to the demanding bandwidth usage and stringent latency requirement. In this paper, we systematically analyze the problem of collaborative stream routing using one-hop forwarding assistance in a bandwidth constraint environment. We model the problem as a multi-source degree-constrained multicast tree construction problem, and investigate heuristic algorithms to construct bandwidth-feasible shared multicast trees. The contribution of this work is primarily two-fold: (1) we study the solution space of finding a feasible bandwidth configuration for stream routing in a peer-to-peer (P2P) setting, and propose two heuristic algorithms that can quickly produce a bandwidth-feasible solution, making them suitable for large-scale conference sessions, (2) we conduct an empirical study using a realistic dataset and show the effectiveness of our heuristic algorithms. Various QoS metrics are taken into account to evaluate the performance of our algorithms. Finally, we discuss open issues for further exploration. The feasibility study presented in this paper will shed light on the design and implementation of practical P2P multiparty video conferencing applications.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132133956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Passive Forensics Method to Detect Tampering for Double JPEG Compression Image 双JPEG压缩图像篡改检测的被动取证方法
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.37
Zhenli Liu, Xiaofeng Wang, Jing Chen
A passive forensics method to detect tampering for double JPEG compression image is proposed. In the proposed method, inconsistency of quality factors is used to detect double JPEG compression, and then a passive forensics approach to detect tampering and locate tampered area for tampered JPEG images is proposed. Comparing with existing methods, the main advantages of the proposed method are as follows: (1) It can detect rotation, scaling and tampering in small area. (2) It has a high computing efficiency.
提出了一种用于双JPEG压缩图像篡改检测的被动取证方法。该方法利用质量因子不一致性检测双重JPEG压缩,然后提出一种被动取证方法,对被篡改的JPEG图像进行篡改检测和篡改区域定位。与现有方法相比,本文方法的主要优点是:(1)能够检测小范围内的旋转、缩放和篡改。(2)计算效率高。
{"title":"Passive Forensics Method to Detect Tampering for Double JPEG Compression Image","authors":"Zhenli Liu, Xiaofeng Wang, Jing Chen","doi":"10.1109/ISM.2011.37","DOIUrl":"https://doi.org/10.1109/ISM.2011.37","url":null,"abstract":"A passive forensics method to detect tampering for double JPEG compression image is proposed. In the proposed method, inconsistency of quality factors is used to detect double JPEG compression, and then a passive forensics approach to detect tampering and locate tampered area for tampered JPEG images is proposed. Comparing with existing methods, the main advantages of the proposed method are as follows: (1) It can detect rotation, scaling and tampering in small area. (2) It has a high computing efficiency.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116608664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multimodal Temporal Panorama for Moving Vehicle Detection and Reconstruction 多模态时间全景移动车辆检测与重建
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.101
Tao Wang, Zhigang Zhu, Clark N. Taylor
In this work, we present a multimodal temporal panorama (MTP) representation that synchronizes visual, motion, and acoustic signatures of moving vehicles in the time axis. The MTP representation includes two layers: a synopsis layer and a snapshot layer. The temporal synopsis consists of 1) a panoramic view image (PVI) to represent vehicles' presence, which is constructed from 1D vertical detecting lines of a selected column location of all video frames, 2) an epipolar plane image (EPI) to characterize their motion (speeds and directions), generated from 1D horizontal scanning lines along the vehicles' moving paths, and 3) an audio wave scroll for visualizing moving vehicles' acoustic signatures. The MTP synopsis not only synchronizes all the three modalities (visual, motion and acoustic) of the vehicles, but also provides information that can perform automatic detection tasks including moving vehicle visual detection, motion estimation, and acoustic signature retrieval. Then in the snapshot layer, the occlusion-free, motion-blur-free, and view-invariant reconstruction of each vehicle (with both shape and motion information) and its acoustic signatures (e.g. spectrogram) are embedded. The MTP provides a very effective approach to (semi-)automatically labeling the multimodal data of uncontrolled traffic scenes in real time for further vehicle classification, check-point inspection and traffic analysis. The concept of MTP may not be only limited to visual, motion and audio modalities, it could also be applicable to other sensing modalities that can obtain data in the temporal domain.
在这项工作中,我们提出了一种多模态时间全景(MTP)表示,该表示在时间轴上同步移动车辆的视觉,运动和声学特征。MTP表示包括两层:摘要层和快照层。时间概要包括:1)代表车辆存在的全景图像(PVI),该图像由所有视频帧的选定列位置的1D垂直检测线构建;2)表征车辆运动(速度和方向)的极平面图像(EPI),由沿着车辆移动路径的1D水平扫描线生成;3)用于可视化移动车辆声学特征的音频波卷。MTP概要不仅同步了车辆的所有三种模式(视觉、运动和声学),而且还提供了可以执行自动检测任务的信息,包括移动车辆的视觉检测、运动估计和声学特征检索。然后在快照层中,嵌入每辆车(包括形状和运动信息)的无遮挡、无运动模糊和视图不变重建及其声学特征(例如谱图)。MTP提供了一种非常有效的方法,可以实时(半)自动标记不受控制的交通场景的多模式数据,以便进一步进行车辆分类、检查站检查和交通分析。MTP的概念可能不仅限于视觉、运动和音频模式,它也可以适用于可以在时域获得数据的其他传感模式。
{"title":"Multimodal Temporal Panorama for Moving Vehicle Detection and Reconstruction","authors":"Tao Wang, Zhigang Zhu, Clark N. Taylor","doi":"10.1109/ISM.2011.101","DOIUrl":"https://doi.org/10.1109/ISM.2011.101","url":null,"abstract":"In this work, we present a multimodal temporal panorama (MTP) representation that synchronizes visual, motion, and acoustic signatures of moving vehicles in the time axis. The MTP representation includes two layers: a synopsis layer and a snapshot layer. The temporal synopsis consists of 1) a panoramic view image (PVI) to represent vehicles' presence, which is constructed from 1D vertical detecting lines of a selected column location of all video frames, 2) an epipolar plane image (EPI) to characterize their motion (speeds and directions), generated from 1D horizontal scanning lines along the vehicles' moving paths, and 3) an audio wave scroll for visualizing moving vehicles' acoustic signatures. The MTP synopsis not only synchronizes all the three modalities (visual, motion and acoustic) of the vehicles, but also provides information that can perform automatic detection tasks including moving vehicle visual detection, motion estimation, and acoustic signature retrieval. Then in the snapshot layer, the occlusion-free, motion-blur-free, and view-invariant reconstruction of each vehicle (with both shape and motion information) and its acoustic signatures (e.g. spectrogram) are embedded. The MTP provides a very effective approach to (semi-)automatically labeling the multimodal data of uncontrolled traffic scenes in real time for further vehicle classification, check-point inspection and traffic analysis. The concept of MTP may not be only limited to visual, motion and audio modalities, it could also be applicable to other sensing modalities that can obtain data in the temporal domain.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123312149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution 使用选择性关键帧识别和基于补丁的超分辨率混合视频压缩
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.25
J. Glaister, Calvin Chan, M. Frankovich, Adrian Tang, A. Wong
This paper details a novel video compression pipeline using selective key frame identification to encode video and patch-based super-resolution to decode for playback. Selective key frame identification uses shot boundary detection and frame differencing methods to identify representative frames which are subsequently kept in high resolution within the compressed container. All other non-key frames are downscaled for compression purposes. Patch-based super-resolution finds similar patches between an up scaled non-key frame and the associated, high-resolution key frame to regain lost detail via a super-resolution process. The algorithm was integrated into the H.264 video compression pipeline tested on web cam, cartoon and live-action video for both streaming and storage purposes. Experimental results show that the proposed hybrid video compression pipeline successfully achieved higher compression ratios than standard H.264, while achieving superior video quality than low resolution H.264 at similar compression ratios.
本文详细介绍了一种新的视频压缩管道,使用选择性关键帧识别对视频进行编码,并使用基于补丁的超分辨率解码进行回放。选择性关键帧识别采用镜头边界检测和帧差分方法识别具有代表性的帧,然后将这些帧以高分辨率保存在压缩容器内。所有其他非关键帧都被压缩。基于补丁的超分辨率在放大的非关键帧和相关的高分辨率关键帧之间找到类似的补丁,通过超分辨率过程恢复丢失的细节。该算法被集成到H.264视频压缩管道中,在网络摄像头、卡通和真人视频中进行了流媒体和存储测试。实验结果表明,所提出的混合视频压缩管道成功地获得了比标准H.264更高的压缩比,同时在相同压缩比下获得了比低分辨率H.264更好的视频质量。
{"title":"Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution","authors":"J. Glaister, Calvin Chan, M. Frankovich, Adrian Tang, A. Wong","doi":"10.1109/ISM.2011.25","DOIUrl":"https://doi.org/10.1109/ISM.2011.25","url":null,"abstract":"This paper details a novel video compression pipeline using selective key frame identification to encode video and patch-based super-resolution to decode for playback. Selective key frame identification uses shot boundary detection and frame differencing methods to identify representative frames which are subsequently kept in high resolution within the compressed container. All other non-key frames are downscaled for compression purposes. Patch-based super-resolution finds similar patches between an up scaled non-key frame and the associated, high-resolution key frame to regain lost detail via a super-resolution process. The algorithm was integrated into the H.264 video compression pipeline tested on web cam, cartoon and live-action video for both streaming and storage purposes. Experimental results show that the proposed hybrid video compression pipeline successfully achieved higher compression ratios than standard H.264, while achieving superior video quality than low resolution H.264 at similar compression ratios.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114579023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Adaptive Approach for Authoring Interactivity for Rich Multimedia Content 富多媒体内容交互性创作的自适应方法
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.39
M. Palviainen, S. Dutton
This paper describes an adaptive content authoring approach, the LIMO Authoring Tool to support the usage of the approach, and two examples in which an editor is adapted for presentation skeletons. The adapted editor aids the users as they create content to be attached to the presentation skeleton that specifies a ready-made baseline (e.g. skeleton, layout, and code libraries) for the presentation. The adapted editor does not just facilitate content creation but can also reduce errors and provide more robust, error free content.
本文描述了一种自适应内容创作方法、支持该方法使用的LIMO创作工具,以及两个示例,其中一个编辑器被用于表示框架。当用户创建要附加到表示框架的内容时,该编辑器可以帮助用户,该框架为表示指定了一个现成的基线(例如,框架、布局和代码库)。经过调整的编辑器不仅可以促进内容创建,还可以减少错误并提供更健壮、无错误的内容。
{"title":"An Adaptive Approach for Authoring Interactivity for Rich Multimedia Content","authors":"M. Palviainen, S. Dutton","doi":"10.1109/ISM.2011.39","DOIUrl":"https://doi.org/10.1109/ISM.2011.39","url":null,"abstract":"This paper describes an adaptive content authoring approach, the LIMO Authoring Tool to support the usage of the approach, and two examples in which an editor is adapted for presentation skeletons. The adapted editor aids the users as they create content to be attached to the presentation skeleton that specifies a ready-made baseline (e.g. skeleton, layout, and code libraries) for the presentation. The adapted editor does not just facilitate content creation but can also reduce errors and provide more robust, error free content.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"722 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Clustering-based Algorithm for Predicting File Size and Structural Similarity of Transcoded JPEG Images 基于聚类的高效JPEG转码图像文件大小和结构相似性预测算法
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.30
S. Pigeon, S. Coulombe
The problem of adapting JPEG images to satisfy constraints such as file size and resolution arises in a number of applications, from universal media access to multimedia messaging services. Visually optimized adaptation, however, commands a non-negligible computational cost which we aim to minimize using predictors. In previous works, we presented predictors and systems to achieve low-cost near-optimal adaptation of JPEG images. In this work, we propose a new approach to file size and quality prediction resulting from the Transco ding of a JPEG image subject to changes in quality factor and resolution. We show that the new predictor significantly outperforms the previously proposed solutions in accuracy.
调整JPEG图像以满足文件大小和分辨率等限制的问题出现在许多应用程序中,从通用媒体访问到多媒体消息传递服务。然而,视觉优化的自适应要求不可忽略的计算成本,我们的目标是使用预测器将其最小化。在之前的工作中,我们提出了预测器和系统来实现低成本的JPEG图像的近最佳自适应。在这项工作中,我们提出了一种新的方法来预测JPEG图像在质量因子和分辨率变化的情况下的文件大小和质量。我们表明,新的预测器在精度上显著优于先前提出的解决方案。
{"title":"Efficient Clustering-based Algorithm for Predicting File Size and Structural Similarity of Transcoded JPEG Images","authors":"S. Pigeon, S. Coulombe","doi":"10.1109/ISM.2011.30","DOIUrl":"https://doi.org/10.1109/ISM.2011.30","url":null,"abstract":"The problem of adapting JPEG images to satisfy constraints such as file size and resolution arises in a number of applications, from universal media access to multimedia messaging services. Visually optimized adaptation, however, commands a non-negligible computational cost which we aim to minimize using predictors. In previous works, we presented predictors and systems to achieve low-cost near-optimal adaptation of JPEG images. In this work, we propose a new approach to file size and quality prediction resulting from the Transco ding of a JPEG image subject to changes in quality factor and resolution. We show that the new predictor significantly outperforms the previously proposed solutions in accuracy.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127924400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Improved Multi-Rate Video Encoding 改进的多速率视频编码
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.53
Dag Haavi Finstad, H. Stensland, H. Espeland, P. Halvorsen
Adaptive HTTP streaming is frequently used for both live and on-Demand video delivery over the Internet. Adaptive ness is often achieved by encoding the video stream in multiple qualities (and thus bit rates), and then transparently switching between the qualities according to the bandwidth fluctuations and the amount of resources available for decoding the video content on the end device. For this kind of video delivery over the Internet, H.264 is currently the most used codec, but VP8 is an emerging open-source codec expected to compete with H.264 in the streaming scenario. The challenge is that, when encoding video for adaptive video streaming, both VP8 and H.264 run once for each quality layer, i.e., consuming both time and resources, especially important in a live video delivery scenario. In this paper, we address the resource consumption issues by proposing a method for reusing redundant steps in a video encoder, emitting multiple outputs with varying bit rates and qualities. It shares and reuses the computational heavy analysis step, notably macro-block mode decision, intra prediction and inter prediction between the instances, and outputs video in several rates. The method has been implemented in the VP8 reference encoder, and experimental results show that we can encode the different quality layers at the same rates and qualities compared to the VP8 reference encoder, while reducing the encoding time significantly.
自适应HTTP流经常用于互联网上的直播和点播视频传输。自适应性通常是通过对视频流进行多种质量(以及比特率)编码来实现的,然后根据带宽波动和终端设备上可用于解码视频内容的资源量在质量之间透明地切换。对于这种通过互联网传输的视频,H.264是目前使用最多的编解码器,但VP8是一种新兴的开源编解码器,有望在流媒体场景中与H.264竞争。挑战在于,当为自适应视频流编码视频时,VP8和H.264对每个质量层都运行一次,也就是说,既消耗时间又消耗资源,这在实时视频传输场景中尤为重要。在本文中,我们通过提出一种在视频编码器中重用冗余步骤的方法来解决资源消耗问题,该方法可以发出具有不同比特率和质量的多个输出。它共享和重用计算量大的分析步骤,特别是宏块模式决策,实例之间的内部预测和内部预测,并以几种速率输出视频。该方法已在VP8参考编码器中实现,实验结果表明,与VP8参考编码器相比,我们可以以相同的速率和质量对不同质量层进行编码,同时显著缩短了编码时间。
{"title":"Improved Multi-Rate Video Encoding","authors":"Dag Haavi Finstad, H. Stensland, H. Espeland, P. Halvorsen","doi":"10.1109/ISM.2011.53","DOIUrl":"https://doi.org/10.1109/ISM.2011.53","url":null,"abstract":"Adaptive HTTP streaming is frequently used for both live and on-Demand video delivery over the Internet. Adaptive ness is often achieved by encoding the video stream in multiple qualities (and thus bit rates), and then transparently switching between the qualities according to the bandwidth fluctuations and the amount of resources available for decoding the video content on the end device. For this kind of video delivery over the Internet, H.264 is currently the most used codec, but VP8 is an emerging open-source codec expected to compete with H.264 in the streaming scenario. The challenge is that, when encoding video for adaptive video streaming, both VP8 and H.264 run once for each quality layer, i.e., consuming both time and resources, especially important in a live video delivery scenario. In this paper, we address the resource consumption issues by proposing a method for reusing redundant steps in a video encoder, emitting multiple outputs with varying bit rates and qualities. It shares and reuses the computational heavy analysis step, notably macro-block mode decision, intra prediction and inter prediction between the instances, and outputs video in several rates. The method has been implemented in the VP8 reference encoder, and experimental results show that we can encode the different quality layers at the same rates and qualities compared to the VP8 reference encoder, while reducing the encoding time significantly.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132760508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Skin Region Extraction and Person-Independent Deformable Face Templates for Fast Video Indexing 用于快速视频索引的皮肤区域提取和独立于人的可变形人脸模板
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.75
S. Clippingdale, Mahito Fujii
We describe a face tracking and recognition system for video and multimedia indexing that handles face regions at variable face poses (left-right and up-down), and deformations due to facial expressions and speech, by employing person-independent deformable templates at multiple poses on the view-sphere. An earlier version of the system handled variable poses (left-right only) by employing person-specific templates registered for each target individual at multiple poses. The new system speeds up processing by (i) extracting and restricting attention to skin-color regions, (ii) performing recognition using person-specific templates at near-frontal poses only, and (iii) tracking at non-frontal poses using the person-independent templates. Registration is also simplified, since multiple views of each target individual are no longer required, at the cost of a loss of recognition functionality at poses far from frontal (the system instead "remembers" the identity of each individual from near-frontal matches and tracks between them). We describe the skin region extraction process and the process by which the person-independent templates are constructed off-line from "bootstrap" face images of multiple non-target individuals, and we present experimental results showing the system in operation. Finally we discuss remaining issues in the practical application of the system to video and multimedia archive indexing.
我们描述了一个用于视频和多媒体索引的人脸跟踪和识别系统,该系统通过在视场上的多个姿态使用独立于人的可变形模板,处理可变面部姿态(左右和上下)的人脸区域,以及由于面部表情和语音而产生的变形。系统的早期版本通过使用针对每个目标个体的多个姿势注册的个人特定模板来处理可变姿势(仅限左右)。新系统通过(i)提取和限制对肤色区域的关注,(ii)仅在近正面姿势使用个人特定模板进行识别,以及(iii)使用独立于个人的模板跟踪非正面姿势来加快处理速度。注册也简化了,因为不再需要每个目标个体的多个视图,代价是在远离正面的姿势上失去识别功能(系统取而代之的是“记住”来自近正面匹配的每个个体的身份,并在它们之间跟踪)。我们描述了皮肤区域的提取过程,以及从多个非目标个体的“bootstrap”人脸图像中离线构建与个人无关的模板的过程,并给出了显示该系统运行的实验结果。最后讨论了该系统在视频和多媒体档案标引中的实际应用中有待解决的问题。
{"title":"Skin Region Extraction and Person-Independent Deformable Face Templates for Fast Video Indexing","authors":"S. Clippingdale, Mahito Fujii","doi":"10.1109/ISM.2011.75","DOIUrl":"https://doi.org/10.1109/ISM.2011.75","url":null,"abstract":"We describe a face tracking and recognition system for video and multimedia indexing that handles face regions at variable face poses (left-right and up-down), and deformations due to facial expressions and speech, by employing person-independent deformable templates at multiple poses on the view-sphere. An earlier version of the system handled variable poses (left-right only) by employing person-specific templates registered for each target individual at multiple poses. The new system speeds up processing by (i) extracting and restricting attention to skin-color regions, (ii) performing recognition using person-specific templates at near-frontal poses only, and (iii) tracking at non-frontal poses using the person-independent templates. Registration is also simplified, since multiple views of each target individual are no longer required, at the cost of a loss of recognition functionality at poses far from frontal (the system instead \"remembers\" the identity of each individual from near-frontal matches and tracks between them). We describe the skin region extraction process and the process by which the person-independent templates are constructed off-line from \"bootstrap\" face images of multiple non-target individuals, and we present experimental results showing the system in operation. Finally we discuss remaining issues in the practical application of the system to video and multimedia archive indexing.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133339278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Utilization of Co-occurrence Relationships between Semantic Concepts in Re-ranking for Information Retrieval 语义概念共现关系在信息检索重排序中的应用
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.18
Chao Chen, Lin Lin, M. Shyu
Semantic information retrieval is a popular research topic in the multimedia area. The goal of the retrieval is to provide the end users with as relevant results as possible. Many research efforts have been done to build ranking models for different semantic concepts (or classes). While some of them have been proven to be effective, others are still far from satisfactory. Our observation that certain target semantic concepts have high co-occurrence relationships with those easy-to-retrieve semantic concepts (or called reference semantics) has motivated us to utilize such co-occurrence relationships between semantic concepts in information retrieval and re-ranking. In this paper, we propose a novel semantic retrieval and re-ranking framework that takes advantage of the co-occurrence relationships between a target semantic concept and a reference semantic concept to re-rank the retrieved results. The proposed framework discretizes the training data into a set of feature-value pairs and employs Multiple Correspondence Analysis (MCA) to capture the correlation in terms of the impact weight between feature-value pairs and the positive-positive class in which the data instances belong to both the target semantic concept and the reference semantic concept. A combination of all these impact weights is utilized to re-rank the retrieved results for the target semantic concept. Comparative experiments are designed and evaluated on TRECVID 2005 and TRECVID 2010 video collections with public-available ranking scores. Experimental results on different retrieval scales demonstrate that our proposed framework can enhance the retrieval results for the target semantic concepts in terms of average precision, and the improvements for some semantic concepts are promising.
语义信息检索是多媒体领域的一个热门研究课题。检索的目标是为最终用户提供尽可能相关的结果。许多研究工作都是为不同的语义概念(或类)构建排序模型。虽然其中一些措施已被证明是有效的,但其他措施仍远不能令人满意。我们观察到某些目标语义概念与那些易于检索的语义概念(或称为参考语义)具有高度的共现关系,这促使我们在信息检索和重新排序中利用语义概念之间的共现关系。本文提出了一种新的语义检索和重新排序框架,该框架利用目标语义概念和参考语义概念之间的共现关系对检索结果进行重新排序。该框架将训练数据离散为一组特征值对,并采用多重对应分析(MCA)来捕获特征值对与数据实例既属于目标语义概念又属于参考语义概念的正-正类之间的影响权重的相关性。利用所有这些影响权重的组合对目标语义概念的检索结果重新排序。在TRECVID 2005和TRECVID 2010视频集上设计了对比实验,并对公开的排名分数进行了评价。在不同检索尺度上的实验结果表明,我们提出的框架能够提高目标语义概念的检索结果的平均精度,并且对某些语义概念的改进是有希望的。
{"title":"Utilization of Co-occurrence Relationships between Semantic Concepts in Re-ranking for Information Retrieval","authors":"Chao Chen, Lin Lin, M. Shyu","doi":"10.1109/ISM.2011.18","DOIUrl":"https://doi.org/10.1109/ISM.2011.18","url":null,"abstract":"Semantic information retrieval is a popular research topic in the multimedia area. The goal of the retrieval is to provide the end users with as relevant results as possible. Many research efforts have been done to build ranking models for different semantic concepts (or classes). While some of them have been proven to be effective, others are still far from satisfactory. Our observation that certain target semantic concepts have high co-occurrence relationships with those easy-to-retrieve semantic concepts (or called reference semantics) has motivated us to utilize such co-occurrence relationships between semantic concepts in information retrieval and re-ranking. In this paper, we propose a novel semantic retrieval and re-ranking framework that takes advantage of the co-occurrence relationships between a target semantic concept and a reference semantic concept to re-rank the retrieved results. The proposed framework discretizes the training data into a set of feature-value pairs and employs Multiple Correspondence Analysis (MCA) to capture the correlation in terms of the impact weight between feature-value pairs and the positive-positive class in which the data instances belong to both the target semantic concept and the reference semantic concept. A combination of all these impact weights is utilized to re-rank the retrieved results for the target semantic concept. Comparative experiments are designed and evaluated on TRECVID 2005 and TRECVID 2010 video collections with public-available ranking scores. Experimental results on different retrieval scales demonstrate that our proposed framework can enhance the retrieval results for the target semantic concepts in terms of average precision, and the improvements for some semantic concepts are promising.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Layered Approach for Fast Multi-view Stereo Panorama Generation 一种快速多视角立体全景生成的分层方法
Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.104
E. Molina, Zhigang Zhu, Clark N. Taylor
In this paper we propose a fast method for constructing multi-view stereo panoramas using a layering approach. Constructing panoramas requires accurate camera pose estimation and will often require an image blending or interpolation method to generate seamless results. We use a registration error correction method that provides globally corrected and fast results for paths that create cycles such as circular paths, back and forth straight sweeps, and even a single sweep. Then we apply our layering approach to generate multi-view stereo panoramas quickly for time sensitive applications that require immediate results and 3D perception.
本文提出了一种利用分层方法快速构建多视点立体全景图的方法。构建全景图需要准确的相机姿态估计,并且通常需要图像混合或插值方法来生成无缝的结果。我们使用了一种注册误差校正方法,该方法可以为创建循环的路径提供全局校正和快速结果,例如圆形路径,来回直线扫描,甚至是单次扫描。然后,我们应用我们的分层方法来快速生成多视图立体全景,用于需要即时结果和3D感知的时间敏感应用。
{"title":"A Layered Approach for Fast Multi-view Stereo Panorama Generation","authors":"E. Molina, Zhigang Zhu, Clark N. Taylor","doi":"10.1109/ISM.2011.104","DOIUrl":"https://doi.org/10.1109/ISM.2011.104","url":null,"abstract":"In this paper we propose a fast method for constructing multi-view stereo panoramas using a layering approach. Constructing panoramas requires accurate camera pose estimation and will often require an image blending or interpolation method to generate seamless results. We use a registration error correction method that provides globally corrected and fast results for paths that create cycles such as circular paths, back and forth straight sweeps, and even a single sweep. Then we apply our layering approach to generate multi-view stereo panoramas quickly for time sensitive applications that require immediate results and 3D perception.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"11 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132207038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2011 IEEE International Symposium on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1