2012 IEEE International Symposium on Multimedia最新文献

英文中文

A Generic Audio Identification System for Radio Broadcast Monitoring Based on Data-Driven Segmentation 一种基于数据驱动分割的通用广播监控音频识别系统

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.87

H. Khemiri, D. Petrovska-Delacrétaz, G. Chollet

In this paper, a generic audio identification system is introduced to identify advertisements and songs in radio broadcast streams using automatically acquired segmental units. A new fingerprinting method based on ALISP data-driven segmentation is presented. A modified BLAST algorithm is also proposed for fast and approximate matching of ALISP sequences. To detect commercials and songs, ALISP transcriptions of references composed of large library of commercials and songs, are compared to the transcriptions of the test radio stream using Levenshtein distance. The system is described and evaluated on broadcast audio streams from 12 French radio stations. For advertisement identification, a mean precision rate of 100% with the corresponding recall value of 98% were achieved. For music identification, a mean precision rate of 100% with the corresponding recall value of 95% were achieved.

本文介绍了一种通用的音频识别系统，利用自动获取的片段单元对广播流中的广告和歌曲进行识别。提出了一种基于ALISP数据驱动分割的指纹识别新方法。提出了一种改进的BLAST算法，用于快速近似匹配ALISP序列。为了检测商业广告和歌曲，使用Levenshtein距离将由大型商业广告和歌曲库组成的参考文献的ALISP转录与测试无线电流的转录进行比较。对该系统进行了描述，并对来自12个法国广播电台的广播音频流进行了评估。广告识别的平均准确率为100%，召回率为98%。音乐识别的平均准确率为100%，召回率为95%。

引用次数: 4

Spatial and Temporal Information as Camera Parameters for Super-resolution Video 时空信息作为超分辨率视频的摄像机参数

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.63

Jussi Tarvainen, M. Nuutinen, P. Oittinen

Most modern consumer cameras are capable of video capture, but their spatial resolution is generally lower than that of still images. The spatial resolution of videos can be enhanced with a hybrid camera system that combines information from high-resolution still images with low-resolution video frames in a process known as super-resolution. As this process is computationally intensive, we propose a camera system that uses the spatial and temporal information measures SI and TI standardized by ITU as camera parameters to determine during capture whether super-resolution processing would result in an increase in perceived quality. Experimental results show that the difference of these two measures can be used to determine the feasibility of super-resolution processing.

大多数现代消费类相机都能拍摄视频，但它们的空间分辨率通常低于静止图像。视频的空间分辨率可以通过混合摄像系统来提高，该系统将高分辨率静止图像的信息与低分辨率视频帧结合在一起，这一过程被称为超分辨率。由于这个过程是计算密集型的，我们提出了一种相机系统，该系统使用国际电联标准化的空间和时间信息测量SI和TI作为相机参数，以确定在捕获过程中超分辨率处理是否会导致感知质量的提高。实验结果表明，这两种度量的差异可以用来判断超分辨处理的可行性。

引用次数: 6

Super-resolution Using GMM and PLS Regression 使用GMM和PLS回归的超分辨率

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.62

Y. Ogawa, Takahiro Hori, T. Takiguchi, Y. Ariki

In recent years, super-resolution techniques in the field of computer vision have been studied in earnest owing to the potential applicability of such technology in a variety of fields. In this paper, we propose a single-image, super-resolution approach using a Gaussian Mixture Model (GMM) and Partial Least Squares (PLS) regression. A GMM-based super-resolution technique is shown to be more efficient than previously known techniques, such as sparse-coding-based techniques. But the GMM-based conversion may result in over fitting. In this paper, an effective technique for preventing over fitting, which combines PLS regression with a GMM, is proposed. The conversion function is constructed using the input image and its self-reduction image. The high-resolution image is obtained by applying the conversion function to the enlarged input image without any outside database. We confirmed the effectiveness of this proposed method through our experiments.

近年来，超分辨率技术在计算机视觉领域得到了广泛的研究，因为该技术在许多领域具有潜在的适用性。在本文中，我们提出了一种使用高斯混合模型(GMM)和偏最小二乘(PLS)回归的单图像超分辨率方法。基于gmm的超分辨率技术被证明比以前已知的技术(如基于稀疏编码的技术)更有效。但是基于gmm的转换可能会导致过拟合。本文提出了一种有效的防止过拟合的方法，即将PLS回归与GMM相结合。利用输入图像及其自约简图像构造转换函数。在不需要任何外部数据库的情况下，对放大后的输入图像应用转换函数得到高分辨率图像。我们通过实验证实了这种方法的有效性。

引用次数: 2

A Study on Caption Recognition for Multi-color Characters on Complex Background 复杂背景下多色字符标题识别研究

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.83

Yutaka Katsuyama, A. Minagawa, Y. Hotta, Jun Sun, S. Omachi

We propose a caption recognition method for multicolor characters on complex background. Caption characters are used for an efficient search on a large amount of recorded TV programs. In the caption character recognition, the caption appearance section and the area is extracted, the character strokes are extracted from the area, and recognized. This paper focuses on caption character strokes extraction and recognition for multi-color characters on complex background which is a very difficult task for the conventional methods. The proposed method extracts decomposed binary images from input color caption image by color clustering. Then character candidates that are composed of combination of connect components are extracted by using recognition certainty. Finally, characters are selected by beyond-color Dynamic Programming method in which weight on recognition certainty and character alignment are used. In the character recognition evaluation of one-line multi-color character string on a complex background, a great improvement was achieved from a conventional technique that can recognize only one-color characters on complex background image.

提出了一种复杂背景下多色字符的标题识别方法。字幕字符用于对大量录制的电视节目进行高效搜索。在字幕字符识别中，提取字幕外观部分和区域，从区域中提取字符笔画，并进行识别。本文主要研究复杂背景下多色文字标题笔画的提取和识别问题，这是传统方法难以解决的问题。该方法利用颜色聚类方法从输入的颜色说明图像中提取分解后的二值图像。然后利用识别确定性提取由连接分量组合而成的候选字符。最后，采用超色动态规划方法，结合识别确定性权值和字符对齐方法进行字符选择。在复杂背景下单行多色字符串的字符识别评价中，比传统的只能识别复杂背景图像上单色字符的方法有了很大的改进。

{"title":"A Study on Caption Recognition for Multi-color Characters on Complex Background","authors":"Yutaka Katsuyama, A. Minagawa, Y. Hotta, Jun Sun, S. Omachi","doi":"10.1109/ISM.2012.83","DOIUrl":"https://doi.org/10.1109/ISM.2012.83","url":null,"abstract":"We propose a caption recognition method for multicolor characters on complex background. Caption characters are used for an efficient search on a large amount of recorded TV programs. In the caption character recognition, the caption appearance section and the area is extracted, the character strokes are extracted from the area, and recognized. This paper focuses on caption character strokes extraction and recognition for multi-color characters on complex background which is a very difficult task for the conventional methods. The proposed method extracts decomposed binary images from input color caption image by color clustering. Then character candidates that are composed of combination of connect components are extracted by using recognition certainty. Finally, characters are selected by beyond-color Dynamic Programming method in which weight on recognition certainty and character alignment are used. In the character recognition evaluation of one-line multi-color character string on a complex background, a great improvement was achieved from a conventional technique that can recognize only one-color characters on complex background image.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128198165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Actor Recognition for Video Services on Mobile Devices 移动设备上视频服务的自动演员识别

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.80

Lai-Tee Cheok, Sol Yee Heo, Donato Mitrani, Anshuman Tewari

Face recognition is one of the most promising and successful applications of image analysis and understanding. Applications include biometrics identification, gaze estimation, emotion recognition, human computer interface, among others. A closed system trained to recognize only a predetermined number of faces will become obsolete very easily. In this paper, we describe a demo that we have developed using face detection and recognition algorithms for recognizing actors/actresses in movies. The demo runs on a Samsung tablet to recognize actors/actresses in the video. We also present our proposed method that allows user to interact with the system during training while watching video. New faces are tracked and trained into new face classifiers as video is continuously playing and the face database is updated dynamically.

人脸识别是图像分析和理解中最有前途和最成功的应用之一。应用包括生物识别、凝视估计、情感识别、人机界面等。一个被训练只能识别预定数量的面孔的封闭系统将很容易过时。在本文中，我们描述了我们开发的一个演示，该演示使用人脸检测和识别算法来识别电影中的演员。该演示在三星平板电脑上运行，以识别视频中的演员。我们还提出了一种允许用户在观看视频的同时与系统进行训练的方法。随着视频的不断播放和人脸数据库的动态更新，新的人脸被跟踪和训练成新的人脸分类器。

引用次数: 1

Visual Text Features for Image Matching 图像匹配的视觉文本特征

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.84

Sam S. Tsai, Huizhong Chen, David M. Chen, Vasu Parameswaran, R. Grzeszczuk, B. Girod

We present a new class of visual text features that are based on text in camera phone images. A robust text detection algorithm locates individual text lines and feeds them to a recognition engine. From the recognized characters, we generate the visual text features in a way that resembles image features. We calculate their location, scale, orientation, and a descriptor that describes the character and word information. We apply visual text features to image matching. To disambiguate false matches, we developed a word-distance matching method. Our experiments with image that contain text show that the new visual text feature based image matching pipeline performs on par or better than a conventional image feature based pipeline while requiring less than 10 bits per feature. This is 4.5× smaller than state-of-the-art visual feature descriptors.

我们提出了一种新的基于相机手机图像文本的视觉文本特征。鲁棒文本检测算法定位单个文本行并将其提供给识别引擎。从识别的字符中，我们以类似于图像特征的方式生成视觉文本特征。我们计算它们的位置、比例、方向以及描述字符和单词信息的描述符。我们将视觉文本特征应用于图像匹配。为了消除错误匹配的歧义，我们开发了一种词距匹配方法。我们对包含文本的图像进行的实验表明，新的基于视觉文本特征的图像匹配管道的性能与传统的基于图像特征的管道相当或更好，而每个特征所需的数据少于10位。这比最先进的视觉特征描述符小4.5倍。

引用次数: 7

SAMHIS: A Robust Motion Space for Human Activity Recognition SAMHIS:一种用于人体活动识别的鲁棒运动空间

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.75

S. Raghuraman, B. Prabhakaran

In recent years, many local descriptor based approaches have been proposed for human activity recognition, which perform well on challenging datasets. However, most of these approaches are computationally intensive, extract irrelevant background features and fail to capture global temporal information. We propose to overcome these issues by introducing a compact and robust motion space that can be used to extract both spatial and temporal aspects of activities using local descriptors. We present Speed Adapted Motion History Image Space (SAMHIS) that employs a variant of Motion History Image for representing motion. This space alleviates both self-occlusion as well as the speed-related issues associated with different kinds of motion. We go on to show using a standard bag of visual words model that extracting appearance based local descriptors from this space is very effective for recognizing activity. Our approach yields promising results on the KTH and Weizmann dataset.

近年来，人们提出了许多基于局部描述符的人类活动识别方法，这些方法在具有挑战性的数据集上表现良好。然而，这些方法大多计算量大，提取不相关的背景特征，无法捕获全局时间信息。我们建议通过引入一个紧凑和健壮的运动空间来克服这些问题，该空间可用于使用局部描述符提取活动的空间和时间方面。我们提出了速度适应运动历史图像空间(SAMHIS)，它采用运动历史图像的一种变体来表示运动。这个空间既缓解了自我遮挡，也缓解了与不同类型的运动相关的速度问题。我们继续展示使用标准的视觉词袋模型，从这个空间中提取基于外观的局部描述符对于识别活动非常有效。我们的方法在KTH和Weizmann数据集上产生了有希望的结果。

引用次数: 1

Energy Conservation in 802.11 WLAN for Mobile Video Calls 802.11无线局域网移动视频通话节能技术

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-01 DOI: 10.1109/ISM.2012.57

Haiyang Ma, Roger Zimmermann

Battery powered mobile devices suffer from significant power consumption of the WiFi network interface during video calls. By utilizing the dynamic Power Save Mode (PSM), our study proposes an adaptive RTP packet transmission scheme for multimedia traffic. By merging the outbound packet delivery timing with inbound packet reception and estimating each delay component along the packet processing and transmission path, each client manages to meet the stringent end-to-end latency for packets while creating longer sleep intervals. As a benefit it involves no cross-layer communication overhead as the interface state transitions are completely transparent to the application. The experimental results show that 28.53% energy savings on the WiFi interface can be achieved while maintaining satisfactory application performance.

电池供电的移动设备在进行视频通话时，WiFi网络接口的功耗很大。本研究利用动态省电模式(PSM)，提出一种多媒体流量的自适应RTP包传输方案。通过合并出站数据包传递时间和入站数据包接收时间，并估计数据包处理和传输路径上的每个延迟组件，每个客户端都设法满足数据包严格的端到端延迟，同时创建更长的睡眠间隔。它的好处是没有跨层通信开销，因为接口状态转换对应用程序完全透明。实验结果表明，在保持满意的应用性能的同时，可以实现WiFi接口节能28.53%。

引用次数: 1

Exploiting Color Strength to Improve Color Correction 利用色彩强度提高色彩校正

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-01 DOI: 10.1109/ISM.2012.43

L. Brown, A. Datta, Sharath Pankanti

Color information is an important feature for many vision algorithms including color correction, image retrieval and tracking. In this paper, we study the limitations of color measurement accuracy and explore how this information can be used to improve the performance of color correction. In particular, we show that a strong correlation exists between the error in hue measurements on one hand and saturation and intensity on the other hand. We introduce the notion of color strength, which is a combination of saturation and intensity information to determine when hue information in a scene is reliable. We verify the predictive capability of this model on two different datasets with ground truth color information. Further, we show how color strength information can be used to significantly improve color correction accuracy for the 11K real-world SFU gray ball dataset.

色彩信息是许多视觉算法的重要特征，包括色彩校正、图像检索和跟踪。在本文中，我们研究了色彩测量精度的局限性，并探讨了如何利用这些信息来提高色彩校正的性能。特别是，我们表明，色相测量误差与饱和度和强度之间存在很强的相关性。我们引入了色彩强度的概念，它是饱和度和强度信息的组合，用于确定场景中的色调信息何时可靠。我们利用地面真色信息在两个不同的数据集上验证了该模型的预测能力。此外，我们展示了如何使用颜色强度信息来显着提高11K真实世界SFU灰球数据集的颜色校正精度。

引用次数: 14

English2MindMap: An Automated System for MindMap Generation from English Text 一个从英语文本自动生成思维导图的系统

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-01 DOI: 10.1109/ISM.2012.103

Mohamed Elhoseiny, A. Elgammal

Mind Mapping is a well-known technique used in note taking and is known to encourage learning and studying. Besides, Mind Mapping can be a very good way to present knowledge and concepts in a visual form. Unfortunately there is no reliable automated tool that can generate Mind Maps from Natural Language text. This paper fills in this gap by developing the first evaluated automated system that takes a text input and generates a Mind Map visualization out of it. The system also could visualize large text documents in multilevel Mind Maps in which a high level Mind Map node could be expanded into child Mind Maps. The proposed approach involves understanding of the input text converting it into intermediate Detailed Meaning Representation (DMR). The DMR is then visualized with two proposed approaches, Single level or Multiple levels which is convenient for larger text. The generated Mind Maps from both approaches were evaluated based on Human Subject experiments performed on Amazon Mechanical Turk with various parameter settings.

思维导图是一种众所周知的用于记笔记的技巧，它可以鼓励学习和学习。此外，思维导图是一种以视觉形式呈现知识和概念的好方法。不幸的是，没有可靠的自动化工具可以从自然语言文本生成思维导图。本文通过开发第一个经过评估的自动化系统来填补这一空白，该系统接受文本输入并从中生成思维导图可视化。该系统还可以在多层思维导图中可视化大型文本文档，其中高级思维导图节点可以扩展为子思维导图。所提出的方法包括理解输入文本，将其转换为中间的详细含义表示(DMR)。然后用两种建议的方法来可视化DMR，单级或多级，这便于较大的文本。根据Amazon Mechanical Turk上不同参数设置的人类受试者实验，对两种方法生成的思维导图进行评估。

引用次数: 13

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE International Symposium on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀