WISMM '14最新文献

英文中文

Pushing Image Recognition in the Real World: Towards Recognizing Millions of Entities 在现实世界中推动图像识别:迈向识别数百万实体

WISMM '14

Pub Date : 2014-11-07 DOI: 10.1145/2661714.2661716

Xiansheng Hua

Building a system that can recognize "what," "who," and "where" from arbitrary images has motivated researchers in computer vision, multimedia and machine learning areas for decades. Significant progresses have been made in recently years based on distributed computation and/or deep neural networks techniques. However, it is still very challenging to realize a general purpose real world image recognition engine that has reasonable recognition accuracy, semantic coverage, and recognition speed. In this talk, firstly we will review the current status of this area, analyze the difficulties, and discuss the potential solutions. Then two promising schemes to attack this challenge will be introduced, including (1) learning millions of concepts from search engine click logs, and (2) recognizing whatever you want without data labeling. The first work tries to build large-scale recognition models by mining search engine click logs. Challenges in training data selection and model selection will be discussed, and efficient and scalable approaches for model training and prediction will be introduced. The second work aims at building image recognition engines for any set of entities without using any human labeled training data, which helps generalize image recognition to a wide range of semantic concepts. Automatic training data generation steps will be presented, and techniques for improving recognition accuracy, which effectively leveraging massive amount of Internet data will be discussed. Different parallelization strategies for different computation tasks will be introduced, which guarantee the efficiency and scalability of the entire system. And last, we will discuss possible directions in pushing image recognition in the real world.

几十年来，建立一个能够从任意图像中识别“什么”、“谁”和“在哪里”的系统一直激励着计算机视觉、多媒体和机器学习领域的研究人员。近年来，基于分布式计算和/或深度神经网络技术取得了重大进展。然而，实现一个具有合理的识别精度、语义覆盖和识别速度的通用现实世界图像识别引擎仍然是非常具有挑战性的。在本次演讲中，我们将首先回顾这一领域的现状，分析其中的困难，并讨论可能的解决方案。然后将介绍两种有希望的解决这一挑战的方案，包括(1)从搜索引擎点击日志中学习数百万个概念，以及(2)在没有数据标记的情况下识别您想要的任何内容。第一项工作试图通过挖掘搜索引擎点击日志来建立大规模的识别模型。将讨论训练数据选择和模型选择方面的挑战，并介绍有效和可扩展的模型训练和预测方法。第二项工作旨在为任何实体集构建图像识别引擎，而不使用任何人类标记的训练数据，这有助于将图像识别推广到广泛的语义概念。本文将介绍自动训练数据生成的步骤，并讨论如何有效利用海量互联网数据来提高识别精度的技术。针对不同的计算任务引入不同的并行化策略，保证了整个系统的效率和可扩展性。最后，我们将讨论在现实世界中推动图像识别的可能方向。

{"title":"Pushing Image Recognition in the Real World: Towards Recognizing Millions of Entities","authors":"Xiansheng Hua","doi":"10.1145/2661714.2661716","DOIUrl":"https://doi.org/10.1145/2661714.2661716","url":null,"abstract":"Building a system that can recognize \"what,\" \"who,\" and \"where\" from arbitrary images has motivated researchers in computer vision, multimedia and machine learning areas for decades. Significant progresses have been made in recently years based on distributed computation and/or deep neural networks techniques. However, it is still very challenging to realize a general purpose real world image recognition engine that has reasonable recognition accuracy, semantic coverage, and recognition speed.\u0000 In this talk, firstly we will review the current status of this area, analyze the difficulties, and discuss the potential solutions. Then two promising schemes to attack this challenge will be introduced, including (1) learning millions of concepts from search engine click logs, and (2) recognizing whatever you want without data labeling. The first work tries to build large-scale recognition models by mining search engine click logs. Challenges in training data selection and model selection will be discussed, and efficient and scalable approaches for model training and prediction will be introduced. The second work aims at building image recognition engines for any set of entities without using any human labeled training data, which helps generalize image recognition to a wide range of semantic concepts. Automatic training data generation steps will be presented, and techniques for improving recognition accuracy, which effectively leveraging massive amount of Internet data will be discussed. Different parallelization strategies for different computation tasks will be introduced, which guarantee the efficiency and scalability of the entire system. And last, we will discuss possible directions in pushing image recognition in the real world.","PeriodicalId":365687,"journal":{"name":"WISMM '14","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116328127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Storytelling with Big Multimedia Data: Keynote Talk 用大多媒体数据讲故事:主题演讲

WISMM '14

Pub Date : 2014-11-07 DOI: 10.1145/2661714.2661715

R. Jain

Big data is becoming increasingly multimedia data. Storytelling is one of the oldest and the most popular activity for humans. Historically, since the early days of human existence, storytelling has been used as a means of simple communication as well as medium of entertainment, education of people, cultural preservation, and instilling moral values through examples. A story is presentation of experiences related to events. Events and their experiences are selected to communicate the intent of a story compellingly. The art of storytelling always had close relationship to technology of the time. A good story considers the message and the audience and then selects appropriate events and proper related experiential media and information to weave a compelling and engaging account of the events. There is a virtuous cycle between storytelling and the technology that is intertwined and synergistic. Historically, both have evolved together and are likely to continue evolving together in the near future. Most events of interest occur in physical world and must be captured using different sensors. Usually a single sensor is inadequate to capture diverse aspects of the event and hence the use of multiple sensors or media to capture an event and also to present event experiences for re-experiencing the events. Now we have diverse sensors to capture an event in all its details and use what will be compelling in storytelling. A good story is the result of many activities: collection of data, analysis of data, selection of events and experiences that are relevant to the message, and a compelling presentation using this material. All of these activities are active research areas in multimedia big data. We discuss different forms of storytelling as they evolved and the role of technology in different stages of storytelling. We believe that now we have powerful tools and technologies to make the art of storytelling really effective. In this presentation we will show challenges for multimedia researchers that could make storytelling very effective and very compelling.

大数据正日益成为多媒体数据。讲故事是人类最古老、最流行的活动之一。从历史上看，从人类存在的早期开始，讲故事就被用作一种简单的交流手段，以及娱乐、人民教育、文化保护和通过榜样灌输道德价值观的媒介。故事是与事件相关的经历的呈现。事件和他们的经历被选择来传达故事的意图。讲故事的艺术总是与当时的技术密切相关。一个好的故事会考虑到信息和受众，然后选择适当的事件和适当的相关体验媒体和信息来编织一个引人注目和引人入胜的事件描述。讲故事和技术之间存在一种良性循环，相互交织、协同作用。从历史上看，两者是共同发展的，并且在不久的将来可能会继续共同发展。大多数感兴趣的事件发生在物理世界中，必须使用不同的传感器来捕获。通常单个传感器不足以捕获事件的各个方面，因此使用多个传感器或媒体来捕获事件并呈现事件体验以重新体验事件。现在我们有各种各样的传感器来捕捉事件的所有细节，并在讲故事时使用引人注目的东西。一个好的故事是许多活动的结果:收集数据，分析数据，选择与信息相关的事件和经历，以及使用这些材料进行引人注目的展示。这些活动都是多媒体大数据研究的活跃领域。我们将讨论不同形式的故事叙述，以及技术在故事叙述的不同阶段所扮演的角色。我们相信，现在我们拥有强大的工具和技术，可以使讲故事的艺术真正有效。在这次演讲中，我们将展示多媒体研究人员面临的挑战，这些挑战可以使讲故事变得非常有效和引人注目。

{"title":"Storytelling with Big Multimedia Data: Keynote Talk","authors":"R. Jain","doi":"10.1145/2661714.2661715","DOIUrl":"https://doi.org/10.1145/2661714.2661715","url":null,"abstract":"Big data is becoming increasingly multimedia data. Storytelling is one of the oldest and the most popular activity for humans. Historically, since the early days of human existence, storytelling has been used as a means of simple communication as well as medium of entertainment, education of people, cultural preservation, and instilling moral values through examples. A story is presentation of experiences related to events. Events and their experiences are selected to communicate the intent of a story compellingly. The art of storytelling always had close relationship to technology of the time. A good story considers the message and the audience and then selects appropriate events and proper related experiential media and information to weave a compelling and engaging account of the events.\u0000 There is a virtuous cycle between storytelling and the technology that is intertwined and synergistic. Historically, both have evolved together and are likely to continue evolving together in the near future. Most events of interest occur in physical world and must be captured using different sensors. Usually a single sensor is inadequate to capture diverse aspects of the event and hence the use of multiple sensors or media to capture an event and also to present event experiences for re-experiencing the events. Now we have diverse sensors to capture an event in all its details and use what will be compelling in storytelling.\u0000 A good story is the result of many activities: collection of data, analysis of data, selection of events and experiences that are relevant to the message, and a compelling presentation using this material. All of these activities are active research areas in multimedia big data. We discuss different forms of storytelling as they evolved and the role of technology in different stages of storytelling. We believe that now we have powerful tools and technologies to make the art of storytelling really effective. In this presentation we will show challenges for multimedia researchers that could make storytelling very effective and very compelling.","PeriodicalId":365687,"journal":{"name":"WISMM '14","volume":"302 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122244196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Large-Scale Aerial Image Categorization by Multi-Task Discriminative Topologies Discovery 基于多任务判别拓扑发现的大规模航空图像分类

WISMM '14

Pub Date : 2014-11-07 DOI: 10.1145/2661714.2661718

Yingjie Xia, Luming Zhang, Suhua Tang

Fast and accurately categorizing the millions of aerial images on Google Maps is a useful technique in multimedia applications. Existing methods cannot handle this task effectively due to two reasons. 1) It is challenging to build a realtime image categorization system, as some geo-aware Apps update over 20 aerial images per second. 2) The aerial images' topologies are the key to distinguish their categories, but they cannot be encoded by the generic visual descriptors. To solve these two problems, we propose an efficient aerial image categorization system, aiming at mining discriminative topologies of aerial images under a multi-task learning framework. Particularly, we first construct a region adjacency graph (RAG) that describes the topology of each aerial image. Thereby, aerial image categorization can be formulated as RAG-to-RAG matching. Based on graph theory, RAG-to-RAG matching is conducted by comparing all their respective graphlets (i.e., small subgraphs). Because the number of graphlets is huge, a multi-task feature selection algorithm is derived to discover topologies jointly discriminative to multiple categories. The discovered topologies are used to extract the discriminative graphlets. Finally, these graphlets are integrated into an AdaBoost model for predicting aerial image categories. Experiments show that our approach is competitive several existing recognition models. Further, over 24 aerial images are categorized per second, reflecting that our system is ready for real-world applications.

快速准确地对谷歌地图上数以百万计的航拍图像进行分类是多媒体应用中的一项有用技术。由于两个原因，现有的方法不能有效地处理这个任务。1)构建实时图像分类系统具有挑战性，因为一些地理感知应用程序每秒更新超过20张航空图像。2)航拍图像的拓扑结构是区分航拍图像类别的关键，但航拍图像的拓扑结构不能被通用的视觉描述符编码。为了解决这两个问题，我们提出了一种高效的航空图像分类系统，旨在在多任务学习框架下挖掘航空图像的判别拓扑。特别地，我们首先构造一个区域邻接图(RAG)来描述每个航空图像的拓扑结构。因此，航空图像分类可以表述为RAG-to-RAG匹配。基于图论，通过比较它们各自的所有图元(即小子图)来进行RAG-to-RAG匹配。针对石墨烯数量庞大的特点，提出了一种多任务特征选择算法，用于发现对多个类别联合判别的拓扑。将发现的拓扑用于提取鉴别石墨烯。最后，将这些小块集成到AdaBoost模型中，用于预测航空图像类别。实验表明，我们的方法与现有的几种识别模型相比具有一定的竞争力。此外，每秒可以对超过24张航拍图像进行分类，这表明我们的系统已经为现实世界的应用做好了准备。

{"title":"Large-Scale Aerial Image Categorization by Multi-Task Discriminative Topologies Discovery","authors":"Yingjie Xia, Luming Zhang, Suhua Tang","doi":"10.1145/2661714.2661718","DOIUrl":"https://doi.org/10.1145/2661714.2661718","url":null,"abstract":"Fast and accurately categorizing the millions of aerial images on Google Maps is a useful technique in multimedia applications. Existing methods cannot handle this task effectively due to two reasons. 1) It is challenging to build a realtime image categorization system, as some geo-aware Apps update over 20 aerial images per second. 2) The aerial images' topologies are the key to distinguish their categories, but they cannot be encoded by the generic visual descriptors. To solve these two problems, we propose an efficient aerial image categorization system, aiming at mining discriminative topologies of aerial images under a multi-task learning framework. Particularly, we first construct a region adjacency graph (RAG) that describes the topology of each aerial image. Thereby, aerial image categorization can be formulated as RAG-to-RAG matching. Based on graph theory, RAG-to-RAG matching is conducted by comparing all their respective graphlets (i.e., small subgraphs). Because the number of graphlets is huge, a multi-task feature selection algorithm is derived to discover topologies jointly discriminative to multiple categories. The discovered topologies are used to extract the discriminative graphlets. Finally, these graphlets are integrated into an AdaBoost model for predicting aerial image categories. Experiments show that our approach is competitive several existing recognition models. Further, over 24 aerial images are categorized per second, reflecting that our system is ready for real-world applications.","PeriodicalId":365687,"journal":{"name":"WISMM '14","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114527455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

WISMM '14

Pub Date : 2014-10-02 DOI: 10.1145/2661714.2661722

T. Yamasaki, Shumpei Sano, K. Aizawa

In this paper, we propose an algorithm to predict the social popularity (i.e., the numbers of views, comments, and favorites) of content on social networking services using only text annotations. Instead of analyzing image/video content, we try to estimate social popularity by a combination of weight vectors obtained from a support vector regression (SVR) and tag frequency. Since our proposed algorithm uses text annotations instead of image/video features, its computational cost is small. As a result, we can estimate social popularity more efficiently than previously proposed methods. Furthermore, tags that significantly affect social popularity can be extracted using our algorithm. Our experiments involved using one million photos on the social networking website Flickr, and the results showed a high correlation between actual social popularity and the determination thereof using our algorithm. Moreover, the proposed algorithm can achieve high classification accuracy with regard to a classification between popular and unpopular content.

在本文中，我们提出了一种仅使用文本注释来预测社交网络服务上内容的社会人气(即观看、评论和收藏的数量)的算法。我们不是分析图像/视频内容，而是尝试通过从支持向量回归(SVR)和标签频率获得的权重向量的组合来估计社会人气。由于我们提出的算法使用文本注释而不是图像/视频特征，因此计算成本小。因此，我们可以比以前提出的方法更有效地估计社会受欢迎程度。此外，可以使用我们的算法提取显著影响社会人气的标签。我们的实验使用了社交网站Flickr上的100万张照片，结果显示，实际的社交人气与使用我们的算法确定的人气之间存在高度相关性。此外，该算法在流行内容和不流行内容的分类方面可以达到较高的分类精度。

引用次数: 25

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

WISMM '14

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀