Proceedings of the 21st ACM international conference on Multimedia最新文献

英文中文

Error recovered hierarchical classification 错误恢复层次分类

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502182

Shiai Zhu, Xiao-Yong Wei, C. Ngo

Hierarchical classification (HC) is a popular and efficient way for detecting the semantic concepts from the images. However, the conventional HC, which always selects the branch with the highest classification response to go on, has the risk of propagating serious errors from higher levels of the hierarchy to the lower levels. We argue that the highest-response-first strategy is too arbitrary, because the candidate nodes are considered individually which ignores the semantic relationship among them. In this paper, we propose a novel method for HC, which is able to utilize the semantic relationship among candidate nodes and their children to recover the responses of unreliable classifiers of the candidate nodes, with the hope of providing the branch selection a more globally valid and semantically consistent view. The experimental results show that the proposed method outperforms the conventional HC methods and achieves a satisfactory balance between the accuracy and efficiency.

层次分类(HC)是从图像中检测语义概念的一种流行且有效的方法。然而，传统的HC总是选择具有最高分类响应的分支继续进行，这有将严重错误从层次结构的较高级别传播到较低级别的风险。我们认为，最高响应优先策略过于武断，因为候选节点被单独考虑，忽略了它们之间的语义关系。在本文中，我们提出了一种新的HC方法，该方法能够利用候选节点及其子节点之间的语义关系来恢复候选节点的不可靠分类器的响应，以期为分支选择提供一个更全局有效和语义一致的视图。实验结果表明，该方法优于传统的HC方法，在精度和效率之间取得了令人满意的平衡。

引用次数: 7

Activity-aware adaptive compression: a morphing-based frame synthesis application in 3DTI 活动感知自适应压缩:3DTI中基于变形的帧合成应用

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2508116

Chien-Nan Chen, Pengye Xia, K. Nahrstedt

In view of the different demands on quality of service of different user activities in the 3D Tele-immersive (3DTI) environment, we combine activity recognition and real-time morphing-based compression and present the Activity-Aware Adaptive Compression. We implement this scheme on our 3DTI platform: the TEEVE Endpoint, which is a runtime engine to handle the creation, transmission and rendering of 3DTI data. User study as well as objective evaluation of the scheme show that it can achieve 25% more bandwidth saving compared to conventional 3D data compression as zlib without perceptible degradation in the user experience.

针对3D远程沉浸(3DTI)环境下不同用户活动对服务质量的不同需求，将活动识别与基于实时变形的压缩相结合，提出了活动感知自适应压缩。我们在我们的3DTI平台上实现了这个方案:TEEVE Endpoint，它是一个运行时引擎，用于处理3DTI数据的创建、传输和渲染。用户研究以及对该方案的客观评估表明，与传统的3D数据压缩相比，该方案可以节省25%的带宽，而不会对用户体验产生明显的影响。

引用次数: 11

Cross-media topic mining on wikipedia 维基百科上的跨媒体主题挖掘

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502180

Xikui Wang, Yang Liu, Donghui Wang, Fei Wu

As a collaborative wiki-based encyclopedia, Wikipedia provides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media) from Wikipedia. In this work, we propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the l1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes our model more interpretable and robust. Furthermore, the correlations of Wikipedia data in different modalities are explicitly considered in our model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets.

作为一个基于维基百科的协作百科全书，维基百科提供了大量不同类别的文章。除了他们的文本语料库，维基百科还包含大量的图像，这使得文章更直观地为读者理解。为了更好地组织这些可视化和文本数据，一个有前途的研究领域是跨维基百科的多模态数据(即跨媒体)联合建模嵌入主题。在这项工作中，我们建议学习将数据从异构特征空间映射到统一的潜在主题空间的投影矩阵。与之前的方法不同，通过对投影矩阵施加l1正则化，每个主题只有少量相关的视觉/文本单词相关联，这使得我们的模型更具可解释性和鲁棒性。此外，在我们的模型中明确考虑了不同模式下维基百科数据的相关性。在真实维基百科数据集上进行的实验验证了所提出的主题提取算法的有效性。

引用次数: 8

Speaking swiss: languages and venues in foursquare 说瑞士语:在四方的语言和场所

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502133

Darshan Santani, D. Gática-Pérez

Due to increasing globalization, urban societies are becoming more multicultural. The availability of large-scale digital mobility traces e.g. from tweets or checkins provides an opportunity to explore multiculturalism that until recently could only be addressed using survey-based methods. In this paper we examine a basic facet of multiculturalism through the lens of language use across multiple cities in Switzerland. Using data obtained from Foursquare over 330 days, we present a descriptive analysis of linguistic differences and similarities across five urban agglomerations in a multicultural, western European country.

由于日益全球化，城市社会正变得更加多元文化。大规模数字移动痕迹的可用性，例如从推特或签到提供了一个探索多元文化主义的机会，直到最近还只能通过基于调查的方法来解决。在本文中，我们通过瑞士多个城市的语言使用来研究多元文化主义的一个基本方面。利用从Foursquare获得的330天的数据，我们对一个多文化的西欧国家的五个城市群的语言差异和相似之处进行了描述性分析。

引用次数: 10

A multigrid approach for bandwidth and display resolution aware streaming of 3D deformations 一种带宽和显示分辨率感知的3D变形流的多网格方法

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502181

Yuan Tian, Y. Yang, X. Guo, B. Prabhakaran

In this paper, we propose a novel multimedia system adaptively streaming the animation according to display resolution and/or network bandwidth. A Multigrid-like technique is used in this framework to accelerate the converging rate of the optimization of the nonlinear deformation energy. The computation is performed from coarsest mesh at the top level to the finest mesh at the bottom level and then goes back to the top again. Such V-shape calculation provides great flexibility for the networked environment. Clients are able to receive the data streaming corresponding to its display resolution and network bandwidth. A more compact form of deformation data packaging is also used in this system such that a cube element only needs six parameters instead of 24 variables as used in regular mesh representation, which significantly reduces the network overhead for the streaming.

在本文中，我们提出了一种新的多媒体系统，可以根据显示分辨率和/或网络带宽自适应地流式传输动画。该框架采用了类多网格技术，加快了非线性变形能优化的收敛速度。计算从顶层最粗的网格到底层最细的网格，然后再回到顶层。这种v型计算为网络环境提供了很大的灵活性。客户端能够接收到与其显示分辨率和网络带宽相对应的数据流。在这个系统中还使用了一种更紧凑的变形数据打包形式，这样一个立方体元素只需要6个参数，而不是像常规网格表示中使用的24个变量，这大大减少了流的网络开销。

引用次数: 1

Session details: Similarity search 会话详细信息:相似性搜索

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/3245288

Yong Rui

引用次数: 0

Spot the differences: from a photograph burst to the single best picture 找出区别:从一张突发的照片到一张最好的照片

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502190

H. E. Tasli, J. V. Gemert, T. Gevers

With the rise of the digital camera, people nowadays typically take several near-identical photos of the same scene to maximize the chances of a good shot. This paper proposes a user-friendly tool for exploring a personal photo gallery for selecting or even creating the best shot of a scene between its multiple alternatives. This functionality is realized through a graphical user interface where the best viewpoint can be selected from a generated panorama of the scene. Once the viewpoint is selected, the user is able to go explore possible alternatives coming from the other images. Using this tool, one can explore a photo gallery efficiently. Moreover, additional compositions from other images are also possible. With such additional compositions, one can go from a burst of photographs to the single best one. Even funny compositions of images, where you can duplicate a person in the same image, are possible with our proposed tool.

随着数码相机的兴起，现在人们通常会为同一个场景拍摄几张几乎相同的照片，以最大限度地提高拍出好照片的机会。本文提出了一个用户友好的工具，用于探索个人图片库，以便在多个备选方案中选择甚至创建场景的最佳照片。该功能是通过图形用户界面实现的，可以从生成的全景场景中选择最佳视点。一旦选择了视点，用户就可以从其他图像中探索可能的替代方案。使用这个工具，可以有效地浏览图片库。此外，其他图像的附加组合也是可能的。有了这些额外的构图，一个人可以从一连串的照片到最好的一张。即使是有趣的图像组合，您可以在同一图像中复制一个人，也可以使用我们建议的工具。

引用次数: 1

Towards efficient sparse coding for scalable image annotation 面向可扩展图像标注的高效稀疏编码

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502127

Junshi Huang, Hairong Liu, Jialie Shen, Shuicheng Yan

Nowadays, content-based retrieval methods are still the development trend of the traditional retrieval systems. Image labels, as one of the most popular approaches for the semantic representation of images, can fully capture the representative information of images. To achieve the high performance of retrieval systems, the precise annotation for images becomes inevitable. However, as the massive number of images in the Internet, one cannot annotate all the images without a scalable and flexible (i.e., training-free) annotation method. In this paper, we particularly investigate the problem of accelerating sparse coding based scalable image annotation, whose off-the-shelf solvers are generally inefficient on large-scale dataset. By leveraging the prior that most reconstruction coefficients should be zero, we develop a general and efficient framework to derive an accurate solution to the large-scale sparse coding problem through solving a series of much smaller-scale subproblems. In this framework, an active variable set, which expands and shrinks iteratively, is maintained, with each snapshot of the active variable set corresponding to a subproblem. Meanwhile, the convergence of our proposed framework to global optimum is theoretically provable. To further accelerate the proposed framework, a sub-linear time complexity hashing strategy, e.g. Locality-Sensitive Hashing, is seamlessly integrated into our framework. Extensive empirical experiments on NUS-WIDE and IMAGENET datasets demonstrate that the orders-of-magnitude acceleration is achieved by the proposed framework for large-scale image annotation, along with zero/negligible accuracy loss for the cases without/with hashing speed-up, compared to the expensive off-the-shelf solvers.

目前，基于内容的检索方法仍是传统检索系统的发展趋势。图像标签可以充分捕捉图像的代表信息，是目前最流行的图像语义表示方法之一。为了实现检索系统的高性能，对图像进行精确标注成为必然。然而，由于互联网上的图像数量庞大，如果没有一种可扩展的、灵活的(即无需训练的)标注方法，就无法对所有的图像进行标注。在本文中，我们特别研究了基于可扩展图像注释的加速稀疏编码问题，其现成的求解器在大规模数据集上通常效率低下。通过利用大多数重构系数应为零的先验，我们开发了一个通用而有效的框架，通过求解一系列更小尺度的子问题来推导大规模稀疏编码问题的精确解。在这个框架中，维护一个活动变量集，它迭代地扩展和缩小，活动变量集的每个快照对应于一个子问题。同时，从理论上证明了该框架对全局最优的收敛性。为了进一步加速所提出的框架，一种亚线性时间复杂度哈希策略，例如位置敏感哈希，被无缝地集成到我们的框架中。在NUS-WIDE和IMAGENET数据集上进行的大量经验实验表明，与昂贵的现成解算器相比，所提出的大规模图像注释框架实现了数量级的加速，并且在没有/有哈希加速的情况下，精度损失为零/可以忽略不计。

{"title":"Towards efficient sparse coding for scalable image annotation","authors":"Junshi Huang, Hairong Liu, Jialie Shen, Shuicheng Yan","doi":"10.1145/2502081.2502127","DOIUrl":"https://doi.org/10.1145/2502081.2502127","url":null,"abstract":"Nowadays, content-based retrieval methods are still the development trend of the traditional retrieval systems. Image labels, as one of the most popular approaches for the semantic representation of images, can fully capture the representative information of images. To achieve the high performance of retrieval systems, the precise annotation for images becomes inevitable. However, as the massive number of images in the Internet, one cannot annotate all the images without a scalable and flexible (i.e., training-free) annotation method. In this paper, we particularly investigate the problem of accelerating sparse coding based scalable image annotation, whose off-the-shelf solvers are generally inefficient on large-scale dataset. By leveraging the prior that most reconstruction coefficients should be zero, we develop a general and efficient framework to derive an accurate solution to the large-scale sparse coding problem through solving a series of much smaller-scale subproblems. In this framework, an active variable set, which expands and shrinks iteratively, is maintained, with each snapshot of the active variable set corresponding to a subproblem. Meanwhile, the convergence of our proposed framework to global optimum is theoretically provable. To further accelerate the proposed framework, a sub-linear time complexity hashing strategy, e.g. Locality-Sensitive Hashing, is seamlessly integrated into our framework. Extensive empirical experiments on NUS-WIDE and IMAGENET datasets demonstrate that the orders-of-magnitude acceleration is achieved by the proposed framework for large-scale image annotation, along with zero/negligible accuracy loss for the cases without/with hashing speed-up, compared to the expensive off-the-shelf solvers.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89860240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Motion matters: a novel framework for compressing surveillance videos 运动很重要:压缩监控视频的新框架

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502145

Xiaojie Guo, Siyuan Li, Xiaochun Cao

Currently, video surveillance plays a very important role in the fields of public safety and security. For storing the videos that usually contain extremely long sequences, it requires huge space. Video compression techniques can be used to release the storage load to some extent, such as H.264/AVC. However, the existing codecs are not sufficiently effective and efficient for encoding surveillance videos as they do not specifically consider the characteristic of surveillance videos, i.e. the background of surveillance video has intensive redundancy. This paper introduces a novel framework for compressing such videos. We first train a background dictionary based on a small number of observed frames. With the trained background dictionary, we then separate every frame into the background and motion (foreground), and store the compressed motion together with the reconstruction coefficient of the background corresponding to the background dictionary. The decoding is carried out on the encoded frame in an inverse procedure. The experimental results on extensive surveillance videos demonstrate that our proposed method significantly reduces the size of videos while gains much higher PSNR compared to the state of the art codecs.

目前，视频监控在公共安全领域发挥着非常重要的作用。对于存储通常包含极长序列的视频，它需要巨大的空间。视频压缩技术可以在一定程度上缓解存储负荷，如H.264/AVC。然而，现有的编解码器并没有具体考虑到监控视频的特点，即监控视频背景具有密集的冗余性，因此在对监控视频进行编码时，其有效性和效率都不够高。本文介绍了一种压缩此类视频的新框架。我们首先基于少量观察到的帧训练一个背景字典。利用训练好的背景字典，我们将每一帧分离为背景和运动(前景)，并将压缩后的运动与背景字典对应的背景重建系数一起存储。解码在一个反向过程中对编码帧进行。在广泛的监控视频上的实验结果表明，我们提出的方法显着减少了视频的大小，同时与最先进的编解码器相比，获得了更高的PSNR。

{"title":"Motion matters: a novel framework for compressing surveillance videos","authors":"Xiaojie Guo, Siyuan Li, Xiaochun Cao","doi":"10.1145/2502081.2502145","DOIUrl":"https://doi.org/10.1145/2502081.2502145","url":null,"abstract":"Currently, video surveillance plays a very important role in the fields of public safety and security. For storing the videos that usually contain extremely long sequences, it requires huge space. Video compression techniques can be used to release the storage load to some extent, such as H.264/AVC. However, the existing codecs are not sufficiently effective and efficient for encoding surveillance videos as they do not specifically consider the characteristic of surveillance videos, i.e. the background of surveillance video has intensive redundancy. This paper introduces a novel framework for compressing such videos. We first train a background dictionary based on a small number of observed frames. With the trained background dictionary, we then separate every frame into the background and motion (foreground), and store the compressed motion together with the reconstruction coefficient of the background corresponding to the background dictionary. The decoding is carried out on the encoded frame in an inverse procedure. The experimental results on extensive surveillance videos demonstrate that our proposed method significantly reduces the size of videos while gains much higher PSNR compared to the state of the art codecs.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89139035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Social life networks: a multimedia problem? 社交生活网络:多媒体问题?

Proceedings of the 21st ACM international conference on Multimedia

Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502279

Amarnath Gupta, R. Jain

Connecting people to the resources they need is a fundamental task for any society. We present the idea of a technology that can be used by the middle tier of a society so that it uses people's mobile devices and social networks to connect the needy with providers. We conceive of a world observatory called the Social Life Network (SLN) that connects together people and things and monitors for people's needs as their life situations evolve. To enable such a system we need SLN to register and recognize situations by combining people's activities and data streaming from personal devices and environment sensors, and based on the situations make the connections when possible. But is this a multimedia problem? We show that many pattern recognition, machine learning, sensor fusion and information retrieval techniques used in multimedia-related research are deeply connected to the SLN problem. We sketch the functional architecture of such a system and show the place for these techniques.

将人们与他们所需的资源联系起来是任何社会的一项基本任务。我们提出了一种可以被社会中间层使用的技术的想法，这样它就可以利用人们的移动设备和社交网络将有需要的人与提供者联系起来。我们设想了一个叫做社会生活网络(Social Life Network，简称SLN)的世界观测站，它将人和物联系在一起，随着人们生活状况的变化，监测人们的需求。为了实现这样一个系统，我们需要SLN通过结合人们的活动和来自个人设备和环境传感器的数据流来记录和识别情况，并根据情况在可能的情况下进行连接。但这是多媒体的问题吗?我们发现，在多媒体相关研究中使用的许多模式识别、机器学习、传感器融合和信息检索技术都与SLN问题密切相关。我们勾画了这样一个系统的功能架构，并展示了这些技术的位置。

引用次数: 11

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 21st ACM international conference on Multimedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀