2012 IEEE International Symposium on Multimedia最新文献

英文中文

Batch Mode Active Learning for Multimedia Pattern Recognition 多媒体模式识别的批处理模式主动学习

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.101

Shayok Chakraborty, V. Balasubramanian, S. Panchanathan

Multimedia applications like face recognition and facial expression recognition inherently rely on the availability of a large amount of labeled data to train a robust recognition system. In order to induce a reliable classification model for a multimedia pattern recognition application, the data is typically labeled by human experts based on some domain knowledge. However, manual annotation of a large number of images is an expensive process in terms of time, labor and human expertise. This has led to the development of active learning algorithms, which automatically identify the salient instances from a given set of unlabeled data and are effective in reducing the human annotation effort to train a classification model. Further, to address the possible presence of multiple labeling oracles, there have been efforts towards a batch form of active learning, where a set of unlabeled images are selected simultaneously for labeling instead of a single image at a time. Existing algorithms on batch mode active learning concentrate only on the development of a batch selection criterion and assume that the batch size (number of samples to be queried from an unlabeled set) to be specified in advance. However, in multimedia applications like face/facial expression recognition, it is difficult to decide on a batch size in advance because of the dynamic nature of video streams. Further, multimedia applications like facial expression recognition involve a fuzzy label space because of the imprecision and the vagueness in the class label boundaries. This necessitates a BMAL framework, for fuzzy label problems. To address these fundamental challenges, we propose two novel BMAL techniques in this work: (i) a framework for dynamic batch mode active learning, which adaptively selects the batch size and the specific instances to be queried based on the complexity of the data stream being analyzed and (ii) a BMAL algorithm for fuzzy label classification problems. To the best of our knowledge, this is the first attempt to develop such techniques in the active learning literature.

人脸识别和面部表情识别等多媒体应用本质上依赖于大量标记数据的可用性来训练强大的识别系统。为了得到一个可靠的多媒体模式识别应用的分类模型，通常由人类专家根据一些领域知识对数据进行标记。然而，对大量图像进行手动标注在时间、劳动力和人力方面都是一个昂贵的过程。这导致了主动学习算法的发展，这些算法自动从给定的一组未标记数据中识别显著实例，并且有效地减少了训练分类模型的人工注释工作。此外，为了解决可能存在的多个标记预言机，已经有了一批主动学习形式的努力，其中一组未标记的图像被同时选择进行标记，而不是一次选择一个图像。现有的批模式主动学习算法只关注于批选择准则的制定，并且假设批大小(从未标记的集合中查询的样本数量)是预先指定的。然而，在像人脸/面部表情识别这样的多媒体应用中，由于视频流的动态性，很难预先确定批处理的大小。此外，由于类标签边界的不精确性和模糊性，面部表情识别等多媒体应用涉及到模糊标签空间。这就需要一个BMAL框架来解决模糊标签问题。为了解决这些基本挑战，我们在这项工作中提出了两种新的BMAL技术:(i)动态批处理模式主动学习框架，该框架根据所分析的数据流的复杂性自适应地选择批大小和要查询的特定实例;(ii)模糊标签分类问题的BMAL算法。据我们所知，这是在主动学习文献中第一次尝试开发这种技术。

{"title":"Batch Mode Active Learning for Multimedia Pattern Recognition","authors":"Shayok Chakraborty, V. Balasubramanian, S. Panchanathan","doi":"10.1109/ISM.2012.101","DOIUrl":"https://doi.org/10.1109/ISM.2012.101","url":null,"abstract":"Multimedia applications like face recognition and facial expression recognition inherently rely on the availability of a large amount of labeled data to train a robust recognition system. In order to induce a reliable classification model for a multimedia pattern recognition application, the data is typically labeled by human experts based on some domain knowledge. However, manual annotation of a large number of images is an expensive process in terms of time, labor and human expertise. This has led to the development of active learning algorithms, which automatically identify the salient instances from a given set of unlabeled data and are effective in reducing the human annotation effort to train a classification model. Further, to address the possible presence of multiple labeling oracles, there have been efforts towards a batch form of active learning, where a set of unlabeled images are selected simultaneously for labeling instead of a single image at a time. Existing algorithms on batch mode active learning concentrate only on the development of a batch selection criterion and assume that the batch size (number of samples to be queried from an unlabeled set) to be specified in advance. However, in multimedia applications like face/facial expression recognition, it is difficult to decide on a batch size in advance because of the dynamic nature of video streams. Further, multimedia applications like facial expression recognition involve a fuzzy label space because of the imprecision and the vagueness in the class label boundaries. This necessitates a BMAL framework, for fuzzy label problems. To address these fundamental challenges, we propose two novel BMAL techniques in this work: (i) a framework for dynamic batch mode active learning, which adaptively selects the batch size and the specific instances to be queried based on the complexity of the data stream being analyzed and (ii) a BMAL algorithm for fuzzy label classification problems. To the best of our knowledge, this is the first attempt to develop such techniques in the active learning literature.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

EDContours: High-Speed Parameter-Free Contour Detector Using EDPF EDContours:使用EDPF的高速无参数轮廓检测器

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.37

C. Akinlar, C. Topal

We present a high-speed contour detector, which we name EDContours, that works by running our real-time parameter-free edge segment detector, Edge Drawing Parameter Free (EDPF), at different scale-space representations of an image. Combining the edge segments detected by EDPF at different scales, EDContours generates a soft contour map for a given image. EDContours works on gray-scale images, is parameter-free, runs very fast, and results in an F-measure score of 0.62 on the Berkeley Segmentation Dataset (BSDS300).

我们提出了一种高速轮廓检测器，我们将其命名为EDContours，它通过在图像的不同比例空间表示下运行我们的实时无参数边缘段检测器，无边缘绘制参数(EDPF)来工作。EDContours结合EDPF在不同比例尺检测到的边缘片段，生成给定图像的软等高线图。EDContours适用于灰度图像，无参数，运行速度非常快，在伯克利分割数据集(BSDS300)上的f测量得分为0.62。

引用次数: 6

3D Model Hypotheses for Player Segmentation and Rendering in Free-Viewpoint Soccer Video 自由视点足球视频中球员分割与渲染的三维模型假设

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.47

Haopeng Li, M. Flierl

This paper presents a player segmentation approach based on 3D model hypotheses for soccer games. We use a hyper plane model for player modeling and a collection of piecewise geometric models for background modeling. To determine the assignment of each pixel in the image plane, we test it with two model hypotheses. We construct a cost function that measures the fitness of model hypotheses for each pixel. To fully utilize the perspective diversity of the multiview imagery, we propose a three-step strategy to choose the best model for each pixel. The experimental results show that our segmentation approach based on 3D model hypotheses outperforms conventional temporal median and graph cut methods for both subjective and objective evaluation.

提出了一种基于三维模型假设的足球比赛球员分割方法。我们使用超平面模型进行玩家建模，并使用一组分段几何模型进行背景建模。为了确定图像平面中每个像素的分配，我们使用两个模型假设进行测试。我们构建了一个成本函数来测量每个像素的模型假设的适应度。为了充分利用多视图图像的视角多样性，我们提出了一种三步策略，为每个像素选择最佳模型。实验结果表明，基于三维模型假设的分割方法在主观和客观评价方面都优于传统的时间中值和图切方法。

引用次数: 0

TEEVE-Remote: A Novel User-Interaction Solution for 3D Tele-immersive System TEEVE-Remote: 3D远程沉浸式系统的新型用户交互解决方案

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.77

Pengye Xia, K. Nahrstedt, M. A. Jurik

3D Tele-immersion (3DTI) system enables geographically distributed users to interact with each other in the virtual 3D space. Many 3DTI applications require users to have frequent physical movement in the application (e.g, 3D interactive exergaming, remote therapy). However, traditional user interaction (UI) solution (which includes large display, mouse/keyboard) for 3DTI system does not give users much freedom to move during the interaction and thus has difficulties to meet this requirement. In this work, we design and implement a novel UI solution TEEVE-Remote which utilizes state-of-the-art camera, mobile phone and display technologies to overcome the difficulties and therefore significantly improve the user experience of 3DTI system.

三维远程沉浸(3DTI)系统使地理上分布的用户能够在虚拟的三维空间中相互交互。许多3DTI应用程序要求用户在应用程序中进行频繁的身体运动(例如，3D交互式游戏，远程治疗)。然而，传统的3DTI系统用户交互(UI)解决方案(包括大屏幕、鼠标/键盘)在交互过程中没有给用户太多的移动自由，难以满足这一要求。在这项工作中，我们设计并实现了一种新颖的UI解决方案TEEVE-Remote，该解决方案利用最先进的相机，手机和显示技术来克服困难，从而显着改善了3DTI系统的用户体验。

引用次数: 1

Automated Visual Quality Analysis for Media Production 媒体制作的自动视觉质量分析

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.82

Hannes Fassold, Stefanie Wechtitsch, Albert Hofmann, W. Bailer, P. Schallauer, R. Borgotallo, A. Messina, Mohan Liu, P. Ndjiki-Nya, Peter Altendorf

Automatic quality control for audiovisual media is an important tool in the media production process. In this paper we present tools for assessing the quality of audiovisual content in order to decide about the reusability of archive content. We first discuss automatic detectors for the common impairments noise and grain, video breakups, sharpness, image dynamics and blocking. For the efficient viewing and verification of the automatic results by an operator, three approaches for user interfaces are presented. Finally, we discuss the integration of the tools into a service oriented architecture, focusing on the recent standardization efforts by EBU and AMWA's Joint Task Force on a Framework for Interoperability of Media Services in TV Production (FIMS).

音像媒体的自动质量控制是媒体制作过程中的重要工具。在本文中，我们提出了评估音像内容质量的工具，以决定档案内容的可重用性。我们首先讨论了常见损伤的自动检测器噪声和颗粒，视频破碎，清晰度，图像动态和阻塞。为了使操作员能够有效地查看和验证自动结果，提出了三种用户界面方法。最后，我们将讨论将这些工具集成到面向服务的体系结构中，重点关注EBU和AMWA的联合工作组最近在电视制作中媒体服务互操作性框架(FIMS)上所做的标准化工作。

引用次数: 4

Detailed Comparative Analysis of VP8 and H.264 VP8和H.264的详细对比分析

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.33

Yousef O. Sharrab, Nabil J. Sarhan

VP8 has recently been offered by Google as an open video compression format in attempt to compete with the widely used H.264 video compression standard. This paper describes the major differences between VP8 and H.264 and provides detailed comparative evaluations through extensive experiments. We use 29 raw video sequences, offering a wide spectrum of resolutions and content characteristics, with the resolution ranging from 176×144 (QCIF) to 3840×2160 (2160p). To ensure a fair study, we use 3 coding presets in H.264, each with three types of tuning, and 7 presets in VP8. The presets cover a variety of achieved quality or complexity levels. The performance metrics include accuracy of bit rate handling, encoding speed, decoding speed, and perceptual video quality.

VP8最近由谷歌提供，作为一种开放的视频压缩格式，试图与广泛使用的H.264视频压缩标准竞争。本文描述了VP8和H.264的主要区别，并通过大量的实验进行了详细的比较评价。我们使用29个原始视频序列，提供广泛的分辨率和内容特征，分辨率范围从176×144 (QCIF)到3840×2160 (2160p)。为了确保公平的研究，我们在H.264中使用了3个编码预设，每个都有三种类型的调优，在VP8中使用了7个预设。预设涵盖了各种已达到的质量或复杂程度。性能指标包括比特率处理的准确性、编码速度、解码速度和感知视频质量。

引用次数: 19

Automatic Classification of Teeth in Bitewing Dental Images Using OLPP 基于OLPP的咬翼牙齿图像自动分类

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.26

Nourdin Al-sherif, G. Guo, H. Ammar

Teeth classification is an important component in building an Automated Dental Identification System (ADIS) as part of creating a data structure that guides tooth-to-tooth matching. This aids in avoiding illogical comparisons that both inefficiently consume the limited computational resources and mislead decision-making. We tackle this problem by using low computational-cost, appearance-based Orthogonal Locality Preserving Projection (OLPP) algorithm to assign an initial class, i.e. molar or premolar to the teeth in bitewing dental images. After this initial classification, we use a string matching technique, based on teeth neighborhood rules, to validate initial teeth-classes and thus assign each tooth a number corresponding to its location in the dental chart. On a large dataset of bitewing films that contain 622 teeth, the proposed approach achieves classification accuracy of 89% and teeth class validation enhances the overall teeth classification accuracy to 92%.

牙齿分类是建立自动牙齿识别系统(ADIS)的重要组成部分，是创建指导牙齿对牙齿匹配的数据结构的一部分。这有助于避免不合逻辑的比较，这种比较既会低效地消耗有限的计算资源，又会误导决策。我们通过使用低计算成本、基于外观的正交局域保持投影(OLPP)算法来为咬翼牙齿图像中的牙齿分配初始类别，即臼齿或前臼齿。在初始分类之后，我们使用基于牙齿邻域规则的字符串匹配技术来验证初始牙齿类别，从而为每颗牙齿分配与其在牙齿图表中的位置相对应的数字。在包含622颗牙齿的咬翼膜大数据集上，该方法的分类准确率达到89%，牙齿分类验证将整体牙齿分类准确率提高到92%。

引用次数: 5

Understanding Your Needs: An Adaptive VoD System 了解您的需求:一个自适应的视频点播系统

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.55

Mu Mu, W. Knowles, N. Race

Video-on-demand (VoD) is becoming a popular service for commercial content distribution by offering end users the freedom to access recorded programmes. The management of on-demand assets is essential to maximise the efficiency of storage and network utilisation as well as advertisement. This paper introduces our recent efforts in design and implementation of an adaptive VoD archive system in an IPTV infrastructure. The system exploits live statistics on the user behaviours as well as the dynamic popularity of VoD programmes. Using the modelled programme popularity function, the VoD archive is capable of managing the VoD repository by adapting to the most recent user requests. The design has greatly improved the activity of VoD repository and user experience in on-demand services.

视频点播(VoD)通过向终端用户提供获取录制节目的自由，正在成为一种流行的商业内容分发服务。按需资产的管理对于最大限度地提高存储和网络利用以及广告的效率至关重要。本文介绍了在IPTV基础设施中自适应VoD档案系统的设计与实现。该系统利用用户行为的实时统计数据以及视频点播节目的动态受欢迎程度。通过建模的节目流行度函数，视频点播库能够根据用户的最新需求对视频点播库进行管理。该设计极大地提高了点播服务中视频点播库的活动性和用户体验。

引用次数: 5

Incorporating Fuzziness in Extended Local Ternary Patterns 扩展局部三元模式的模糊融合

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.36

W. Liao

Local binary/ternary patterns are widely employed to describe the structure of an image region. However, local patterns are very sensitive to noise due to the thresholding process. In this paper, we propose two different approaches to incorporate fuzziness in extended local ternary patterns (ELTP) to enhance the robustness of this class of operator to interferences. The first approach replaces the ternary mapping mechanism with fuzzy member functions to arrive at a fuzzy ELTP representation. The second approach modifies the clustering operation in formulating ELTP to a fuzzy C-means procedure to construct soft histograms in the final feature representation, denoted as FCM-ELTP. Both fuzzy descriptors have proven to exhibit better resistance to noise in the experiments designed to compare the performance of ELTP and the newly proposed fuzzy ELTP and FCM-ELTP.

局部二/三元模式被广泛用于描述图像区域的结构。然而，由于阈值处理，局部模式对噪声非常敏感。在本文中，我们提出了两种不同的方法来结合模糊扩展局部三元模式(ELTP)，以提高这类算子对干扰的鲁棒性。第一种方法用模糊成员函数取代三元映射机制，得到模糊ELTP表示。第二种方法将制定ELTP的聚类操作修改为模糊c均值过程，在最终的特征表示中构建软直方图，称为FCM-ELTP。在比较ELTP和新提出的模糊ELTP和FCM-ELTP性能的实验中，两种模糊描述符都被证明具有更好的抗噪声性能。

引用次数: 2

Color-Weakness Compensation Using Riemann Normal Coordinates 利用黎曼法向坐标进行色弱补偿

2012 IEEE International Symposium on Multimedia

Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.42

S. Oshima, Rika Mochizuki, R. Lenz, J. Chao

We introduce normal coordinates in Riemann spaces as a tool to construct color-weak compensation methods. We use them to compute color stimuli for a color weak observers that result in the same color perception as the original image presented to a color normal observer in the sense that perceived color-differences are identical for both. The compensation is obtained through a color-difference-preserving map, i.e. an isometry between the 3D color spaces of a color-normal and any given color-weak observer. This approach uses discrimination threshold data and is free from approximation errors due to local linearization. The performance is evaluated with the help of semantic differential (SD) tests.

我们在黎曼空间中引入法向坐标作为构造弱色补偿方法的工具。我们用它们来计算色弱观察者的颜色刺激，结果与呈现给色正常观察者的原始图像的颜色感知相同，因为两者感知到的颜色差异是相同的。补偿是通过色差保持映射来实现的，即颜色法线的三维颜色空间与任何给定的弱色观察者之间的等距。该方法使用判别阈值数据，不存在局部线性化引起的近似误差。通过语义差异(SD)测试对性能进行了评估。

引用次数: 3

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE International Symposium on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀