2011 IEEE International Symposium on Multimedia最新文献

英文中文

Food Product Information Supplement System - Corresponding to Consumer Needs for Shopping and Eating Out 食品信息补充系统-与消费者购物和外出就餐需求相对应

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.72

Kayo H. Iizuka, Takuya Okawada, Yasuki Iizuka

In this paper, the authors propose an effective information system that can supply food product information to meet consumer needs. Consumer awareness of food safety has recently intensified, and food allergy issues seem to be of increasing concern these days, hence the need for effective solutions are required. Improving the quality of food might be an important solution, but supplementation with key information for consumers might also be considered important. To help resolve this issue, the authors developed a prototype system to meet consumer needs based on the survey conducted. Place Engine is implemented for this system, allowing users to estimate their current location by utilizing Wi-Fi devices, and obtain information as to where they can obtain food that meets their requirements nearby.

在本文中，作者提出了一个有效的信息系统，可以提供食品信息，以满足消费者的需求。近年来，消费者的食品安全意识日益增强，食物过敏问题也越来越受到关注，因此需要有效的解决方案。提高食品质量可能是一个重要的解决方案，但向消费者补充关键信息也可能被认为是重要的。为了帮助解决这个问题，作者根据所进行的调查开发了一个原型系统来满足消费者的需求。该系统实现了位置引擎，允许用户利用Wi-Fi设备估计自己当前的位置，并获得附近哪里可以获得满足自己要求的食物的信息。

引用次数: 2

Support Vector Regression Based Video Quality Prediction 基于支持向量回归的视频质量预测

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.84

Beibei Wang, D. Zou, Ran Ding

To measure the quality of experience (QoE) of a video, the current approaches of objective quality metrics development focus on how to design a video quality model, which considers the effects of the extracted features and models the Human Visual System (HVS). However, video quality metrics which try to model the HVS confronts a fact that HVS is too complicated and not well understood to model. In this paper, instead of modeling the objective quality metrics with some functions, we proposed to build a video quality metrics using the support vector machines (SVMs) supervised learning. With the proposed SVM based video quality prediction, it allows a much better approximation to the NTIA-VQM and MOS values, compared to the previous G.1070-based video quality prediction. We further investigated how to choose the certain features which can be efficiently used as SVM input variables.

为了测量视频的体验质量(QoE)，目前的客观质量度量方法主要集中在如何设计一个视频质量模型，该模型考虑了提取的特征的影响，并对人类视觉系统(HVS)进行建模。然而，试图对HVS进行建模的视频质量指标面临着一个事实，即HVS过于复杂，无法很好地理解。本文提出了一种基于支持向量机(svm)监督学习的视频质量度量方法，而不是用一些函数对目标质量度量进行建模。与之前基于g .1070的视频质量预测相比，本文提出的基于SVM的视频质量预测可以更好地逼近NTIA-VQM和MOS值。我们进一步研究了如何选择能够有效地作为支持向量机输入变量的某些特征。

引用次数: 7

3D Face Fitting Method Based on 2D Active Appearance Models 基于二维活动外观模型的三维人脸拟合方法

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.11

Myung-Ho Ju, Hang-Bong Kang

Special cameras such as 3D scanners or depth cameras are necessary in recognizing 3D shapes from input faces. In this paper, we propose an efficient face fitting method which is able to fit various faces including any variations of 3D poses (the rotation of X, Y axes) and facial expressions. Our method takes an advantage of 2D Active Appearance Models (AAM) from 2D face images rather than using the depth information measured by special cameras. We first construct an AAM for the variations of the facial expression. Then, we estimate depth information of each land-mark from frontal and side view images. By combining the estimated depth information with AAM, we can fit various 3D transformed faces. Self-occlusions due to the 3D pose variation are also processed by the region weighting function on the normalized face at each frame. Our experimental results show that the proposed method can efficiently fit various faces better than the typical AAM and View-based AAM.

特殊的相机，如3D扫描仪或深度相机是必要的，从输入的面孔识别3D形状。在本文中，我们提出了一种有效的人脸拟合方法，该方法能够拟合各种人脸，包括任何3D姿势(X轴，Y轴的旋转)和面部表情的变化。我们的方法利用了二维人脸图像的二维活动外观模型(AAM)，而不是使用特殊相机测量的深度信息。我们首先构建了面部表情变化的AAM。然后，我们从正面和侧面图像中估计每个地标的深度信息。将估计的深度信息与AAM相结合，可以拟合各种三维变换后的人脸。在每一帧的归一化人脸上，利用区域加权函数对三维姿态变化引起的自遮挡进行处理。实验结果表明，与传统的AAM和基于视图的AAM相比，该方法可以有效地拟合各种人脸。

引用次数: 2

Enhancing Local Binary Patterns Distinctiveness for Face Representation 增强局部二值模式特征的人脸表征

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.78

M. Ghahramani, W. Yau, E. Teoh

The Local Binary pattern (LBP) is a well-known feature and has been widely used for human identification. However, the amount of information extracted is limited which reduces the LBP discriminative power. Recently, some enhancements have been proposed by adding preprocessing stages or considering more neighbor pixels to enrich the extracted feature. In this paper, we propose Uniformly-sampled Thresholds for LBP (UTLBP) operator that increases the richness of information derived from the LBP feature. It outperforms other features in various probe sets of the large CAS-PEAL database for face recognition. Moreover, we collected a database of 25 families to verify the superiority of the proposed feature in the family verification. Results show that using the UTLBP, the total error in face recognition and family verification is reduced up to 8% and 3% respectively comparing to the state of the art LBP. It improves the missing family member verification performance up to 3% where, contrary to expectation, increasing the LBP operator radius worsens the performance by 2%.

局部二值模式(LBP)是一种众所周知的特征，已广泛用于人体识别。然而，提取的信息量有限，降低了LBP的判别能力。最近，人们提出了一些增强方法，通过增加预处理阶段或考虑更多的相邻像素来丰富提取的特征。在本文中，我们提出了均匀采样阈值(UTLBP)算子，增加了从LBP特征中获得的信息的丰富性。它在人脸识别的大型CAS-PEAL数据库的各种探测集中优于其他特征。此外，我们收集了25个家庭的数据库来验证所提出的特征在家庭验证中的优越性。结果表明，与现有的LBP相比，UTLBP在人脸识别和家族验证中的总误差分别降低了8%和3%。它将丢失家庭成员的验证性能提高了3%，而与预期相反，增加LBP算子半径会使性能下降2%。

引用次数: 4

An Approach for Modeling the Effects of Video Resolution and Size on the Perceived Visual Quality 视频分辨率和尺寸对感知视觉质量影响的建模方法

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.82

Benjamin Belmudez, S. Möller

Video-telephony and mobile TV are typical multimedia services which are becoming a part of the everyday life due to the increase in bandwidth availability and also viewing devices with larger screen sizes (smart phone, PDA, etc). To ensure high quality, packet layer parametric quality prediction models for audio-visual services like video-telephony and IPTV video streaming have emerged and are still under development. Those parametric models depend on a set of parameters which have to be tuned for every specific application. In this work, we carry out an experiment to analyze the impact of video resolution and up scaling operation on perceived quality. We could show that the current parametric models can be modified to explicitly integrate the joined effect of resolution and display video size.

视频电话和移动电视是典型的多媒体服务，由于带宽可用性的增加和屏幕尺寸更大的观看设备(智能手机，PDA等)，它们正在成为日常生活的一部分。为了保证高质量，针对视频电话和IPTV视频流等视听业务的分组层参数质量预测模型已经出现，并且仍在开发中。这些参数模型依赖于一组参数，这些参数必须针对每个特定的应用程序进行调整。在这项工作中，我们进行了一个实验来分析视频分辨率和放大操作对感知质量的影响。我们可以证明，当前的参数模型可以被修改，以显式地集成分辨率和显示视频尺寸的联合效应。

引用次数: 19

The Mosaic Camera: Streaming, Coding and Compositing Experiments 马赛克相机:流，编码和合成实验

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.106

Mashhour Solh, G. Al-Regib

The HP Fan Camera is a panoramic mosaic king camera that is a composite of 24-imager array system. Streaming the captured video is a challenging problem due to several factors such as the large bandwidth requirements, the limited capabilities of the client's machines, and our desire to provide independent viewing controls for users. In the process of developing an optimal rate controller for the HP Fan Camera we developed a client-server framework for multi-camera streaming and performed a set of experiments using various bandwidth allocation schemes. From our preliminary research, we found that sending individual streams of the cameras over the network provides more interactivity to the end users and requires less bandwidth in case the behavior of the end users is aggressive in scene selection. In this paper we present this framework and share the results of our conducted experiments.

惠普风扇相机是一款由24个成像仪阵列系统组合而成的全景马赛克王相机。由于几个因素，如大带宽需求、客户端机器的有限能力以及我们希望为用户提供独立的观看控制，流媒体捕获的视频是一个具有挑战性的问题。在为HP风扇摄像机开发最优速率控制器的过程中，我们开发了一个多摄像机流媒体的客户端-服务器框架，并使用各种带宽分配方案进行了一组实验。从我们的初步研究中，我们发现通过网络发送单个摄像机流为最终用户提供了更多的交互性，并且在最终用户的行为在场景选择中具有侵略性的情况下需要更少的带宽。在本文中，我们提出了这个框架，并分享了我们进行的实验结果。

引用次数: 0

Concept Learning with Co-occurrence Network for Image Retrieval 基于共现网络的概念学习图像检索

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.77

Linan Feng, B. Bhanu

This paper addresses the problem of concept learning for semantic image retrieval. Two types of semantic concepts are introduced in our system: the individual concept and the scene concept. The individual concepts are explicitly provided in a vocabulary of semantic words, which are the labels or annotations in an image database. Scene concepts are higher level concepts which are defined as potential patterns of co occurrence of individual concepts. Scene concepts exist since some of the individual concepts co-occur frequently across different images. This is similar to human learning where understanding of simpler ideas is generally useful prior to developing more sophisticated ones. Scene concepts can have more discriminative power compared to individual concepts but methods are needed to find them. A novel method for deriving scene concepts is presented. It is based on a weighted concept co-occurrence network (graph) with detected community structure property. An image similarity comparison and retrieval framework is described with the proposed individual and scene concept signature as the image semantic descriptors. Extensive experiments are conducted on a publicly available dataset to demonstrate the effectiveness of our concept learning and semantic image retrieval framework.

本文研究了语义图像检索中的概念学习问题。在我们的系统中引入了两种语义概念:个体概念和场景概念。单个概念在语义词汇表中显式提供，语义词汇表是图像数据库中的标签或注释。场景概念是更高层次的概念，它被定义为单个概念共同出现的潜在模式。场景概念的存在是因为一些单独的概念经常在不同的图像中共同出现。这类似于人类的学习，在理解更复杂的想法之前，理解更简单的想法通常是有用的。与单个概念相比，场景概念可能具有更强的辨别能力，但需要找到它们的方法。提出了一种新的场景概念提取方法。它是基于一个带有检测到的群落结构属性的加权概念共现网络(图)。以所提出的个体和场景概念签名作为图像语义描述符，描述了一个图像相似度比较与检索框架。在一个公开可用的数据集上进行了大量的实验，以证明我们的概念学习和语义图像检索框架的有效性。

{"title":"Concept Learning with Co-occurrence Network for Image Retrieval","authors":"Linan Feng, B. Bhanu","doi":"10.1109/ISM.2011.77","DOIUrl":"https://doi.org/10.1109/ISM.2011.77","url":null,"abstract":"This paper addresses the problem of concept learning for semantic image retrieval. Two types of semantic concepts are introduced in our system: the individual concept and the scene concept. The individual concepts are explicitly provided in a vocabulary of semantic words, which are the labels or annotations in an image database. Scene concepts are higher level concepts which are defined as potential patterns of co occurrence of individual concepts. Scene concepts exist since some of the individual concepts co-occur frequently across different images. This is similar to human learning where understanding of simpler ideas is generally useful prior to developing more sophisticated ones. Scene concepts can have more discriminative power compared to individual concepts but methods are needed to find them. A novel method for deriving scene concepts is presented. It is based on a weighted concept co-occurrence network (graph) with detected community structure property. An image similarity comparison and retrieval framework is described with the proposed individual and scene concept signature as the image semantic descriptors. Extensive experiments are conducted on a publicly available dataset to demonstrate the effectiveness of our concept learning and semantic image retrieval framework.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115829278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On the Applicability of Speaker Diarization to Audio Concept Detection for Multimedia Retrieval 基于多媒体检索的说话人特征化在音频概念检测中的应用

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.79

R. Mertens, Po-Sen Huang, L. Gottlieb, G. Friedland, Ajay Divakaran

Recently, audio concepts emerged as a useful building block in multimodal video retrieval systems. Information like "this file contains laughter", "this file contains engine sounds" or "this file contains slow music" can significantly improve purely visual based retrieval. The weak point of current approaches to audio concept detection is that they heavily rely on human annotators. In most approaches, audio material is manually inspected to identify relevant concepts. Then instances that contain examples of relevant concepts are selected -- again manually -- and used to train concept detectors. This approach comes with two major disadvantages: (1) it leads to rather abstract audio concepts that hardly cover the audio domain at hand and (2) the way human annotators identify audio concepts likely differs from the way a computer algorithm clusters audio data -- introducing additional noise in training data. This paper explores whether unsupervized audio segementation systems can be used to identify useful audio concepts by analyzing training data automatically and whether these audio concepts can be used for multimedia document classification and retrieval. A modified version of the ICSI (International Computer Science Institute) speaker diarization system finds segments in an audio track that have similar perceptual properties and groups these segments. This article provides an in-depth analysis on the statistic properties of similar acoustic segments identified by the diarization system in a predefined document set and the theoretical fitness of this approach to discern one document class from another.

最近，音频概念在多模式视频检索系统中成为一个有用的构建块。像“这个文件包含笑声”、“这个文件包含引擎声音”或“这个文件包含慢音乐”这样的信息可以显著提高纯粹基于视觉的检索。当前音频概念检测方法的弱点是它们严重依赖于人类注释者。在大多数方法中，音频材料是手动检查以识别相关概念。然后选择包含相关概念示例的实例(同样是手动的)，并用于训练概念检测器。这种方法有两个主要缺点:(1)它导致相当抽象的音频概念，几乎无法覆盖手头的音频领域;(2)人类注释器识别音频概念的方式可能与计算机算法聚类音频数据的方式不同——在训练数据中引入额外的噪声。本文探讨了无监督音频分割系统是否可以通过自动分析训练数据来识别有用的音频概念，以及这些音频概念是否可以用于多媒体文档的分类和检索。ICSI(国际计算机科学研究所)的一个改进版本的说话人分类系统在音轨中发现具有相似感知特性的片段，并将这些片段分组。本文深入分析了在预定义的文档集中，由diarization系统识别的相似声学段的统计特性，以及该方法在区分不同文档类别方面的理论适应度。

{"title":"On the Applicability of Speaker Diarization to Audio Concept Detection for Multimedia Retrieval","authors":"R. Mertens, Po-Sen Huang, L. Gottlieb, G. Friedland, Ajay Divakaran","doi":"10.1109/ISM.2011.79","DOIUrl":"https://doi.org/10.1109/ISM.2011.79","url":null,"abstract":"Recently, audio concepts emerged as a useful building block in multimodal video retrieval systems. Information like \"this file contains laughter\", \"this file contains engine sounds\" or \"this file contains slow music\" can significantly improve purely visual based retrieval. The weak point of current approaches to audio concept detection is that they heavily rely on human annotators. In most approaches, audio material is manually inspected to identify relevant concepts. Then instances that contain examples of relevant concepts are selected -- again manually -- and used to train concept detectors. This approach comes with two major disadvantages: (1) it leads to rather abstract audio concepts that hardly cover the audio domain at hand and (2) the way human annotators identify audio concepts likely differs from the way a computer algorithm clusters audio data -- introducing additional noise in training data. This paper explores whether unsupervized audio segementation systems can be used to identify useful audio concepts by analyzing training data automatically and whether these audio concepts can be used for multimedia document classification and retrieval. A modified version of the ICSI (International Computer Science Institute) speaker diarization system finds segments in an audio track that have similar perceptual properties and groups these segments. This article provides an in-depth analysis on the statistic properties of similar acoustic segments identified by the diarization system in a predefined document set and the theoretical fitness of this approach to discern one document class from another.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131597263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Quantification of YouTube QoE via Crowdsourcing 通过众包量化YouTube QoE

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.87

T. Hossfeld, Michael Seufert, Matthias Hirth, T. Zinner, P. Tran-Gia, R. Schatz

This paper addresses the challenge of assessing and modeling Quality of Experience (QoE) for online video services that are based on TCP-streaming. We present a dedicated QoE model for You Tube that takes into account the key influence factors (such as stalling events caused by network bottlenecks) that shape quality perception of this service. As second contribution, we propose a generic subjective QoE assessment methodology for multimedia applications (like online video) that is based on crowd sourcing - a highly cost-efficient, fast and flexible way of conducting user experiments. We demonstrate how our approach successfully leverages the inherent strengths of crowd sourcing while addressing critical aspects such as the reliability of the experimental data obtained. Our results suggest that, crowd sourcing is a highly effective QoE assessment method not only for online video, but also for a wide range of other current and future Internet applications.

本文解决了基于tcp流的在线视频服务的体验质量(QoE)评估和建模的挑战。我们为youtube提出了一个专用的QoE模型，该模型考虑了影响该服务质量感知的关键影响因素(例如由网络瓶颈引起的延迟事件)。作为第二项贡献，我们提出了一种基于众包的多媒体应用(如在线视频)的通用主观QoE评估方法，这是一种进行用户实验的高成本效益、快速和灵活的方法。我们展示了我们的方法如何成功地利用众包的固有优势，同时解决关键问题，如获得的实验数据的可靠性。我们的研究结果表明，众包是一种非常有效的QoE评估方法，不仅适用于在线视频，而且适用于其他广泛的当前和未来的互联网应用。

引用次数: 362

Audio Recurrence Contribution to a Video-based TV Program Structuring Approach 音频递归对基于视频的电视节目结构方法的贡献

2011 IEEE International Symposium on Multimedia

Pub Date : 2011-12-05 DOI: 10.1109/ISM.2011.15

Alina Elma Abduraman, Sid-Ahmed Berrani, B. Mérialdo

This paper addresses the problem of unsupervised TV programs structuring. Program structuring allows direct and non linear access to the desired parts of a program. Our work addresses the structuring of recurrent TV programs like news, entertainment programs, TV shows, TV magazines. In a previous work we proposed a program structuring method based on the detection of video recurrences. In this paper we extend our study to audio recurrences and verify their influence on the final structuring. We evaluate the structuring results on both approaches (audio and video) separately and jointly. We use for evaluation a 62 hours dataset corresponding to 97 episodes of TV programs.

本文研究了无监督电视节目结构问题。程序结构允许直接和非线性地访问程序的所需部分。我们的工作涉及新闻、娱乐节目、电视节目、电视杂志等经常性电视节目的结构。在之前的工作中，我们提出了一种基于视频递归检测的程序结构化方法。在本文中，我们将研究扩展到音频递归，并验证它们对最终结构的影响。我们分别和联合评估了两种方法(音频和视频)的结构化结果。我们使用与97集电视节目相对应的62小时数据集进行评估。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 IEEE International Symposium on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀