首页 > 最新文献

Frontiers of Multimedia Research最新文献

英文 中文
Multimedia fog computing: minions in the cloud and crowd 多媒体雾计算:云和人群中的仆从
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122876
Cheng-Hsin Hsu, Hua-Jun Hong, Tarek Elgamal, K. Nahrstedt, N. Venkatasubramanian
In cloud computing, minions refer to virtual or physical machines that carry out the actual workload. Minions in the cloud hide in faraway data centers and thus cloud computing is less friendly to multimedia applications. The fog computing paradigm pushes minions toward edge networks. We adopt a generalized definition, where minions get into end devices owned by the crowd. The serious uncertainty, such as dynamic network conditions, limited battery levels, and unpredictable minion availability in multimedia fog platforms makes them harder to be managed than cloud platforms. In this chapter, we share our experience on utilizing resources from the crowd to optimize multimedia applications. The learned lessons shed some light on the optimal design of a unified multimedia fog platform for distributed multimedia applications.
在云计算中,随从是指执行实际工作负载的虚拟机或物理机。云中的仆从隐藏在遥远的数据中心,因此云计算对多媒体应用程序不太友好。雾计算范式将仆从推向边缘网络。我们采用一个广义的定义,即小黄人进入大众拥有的终端设备。在多媒体雾平台中,严重的不确定性,如动态网络条件、有限的电池电量和不可预测的随从可用性,使得它们比云平台更难管理。在这一章中,我们分享了利用群众资源来优化多媒体应用程序的经验。这些经验教训为分布式多媒体应用统一多媒体雾平台的优化设计提供了一些启示。
{"title":"Multimedia fog computing: minions in the cloud and crowd","authors":"Cheng-Hsin Hsu, Hua-Jun Hong, Tarek Elgamal, K. Nahrstedt, N. Venkatasubramanian","doi":"10.1145/3122865.3122876","DOIUrl":"https://doi.org/10.1145/3122865.3122876","url":null,"abstract":"In cloud computing, minions refer to virtual or physical machines that carry out the actual workload. Minions in the cloud hide in faraway data centers and thus cloud computing is less friendly to multimedia applications. The fog computing paradigm pushes minions toward edge networks. We adopt a generalized definition, where minions get into end devices owned by the crowd. The serious uncertainty, such as dynamic network conditions, limited battery levels, and unpredictable minion availability in multimedia fog platforms makes them harder to be managed than cloud platforms. In this chapter, we share our experience on utilizing resources from the crowd to optimize multimedia applications. The learned lessons shed some light on the optimal design of a unified multimedia fog platform for distributed multimedia applications.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Audition for multimedia computing 多媒体计算试听
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122868
G. Friedland, P. Smaragdis, Josh H. McDermott, B. Raj
What do the fields of robotics, human-computer interaction, AI, video retrieval, privacy, cybersecurity, Internet of Things, and big data all have in common? They all work with various sources of data: visual, textual, time stamps, links, records. But there is one source of data that has been almost completely ignored by the academic community---sound. Our comprehension of the world relies critically on audition---the ability to perceive and interpret the sounds we hear. Sound is ubiquitous, and is a unique source of information about our environment and the events occurring in it. Just by listening, we can determine whether our child's laughter originated inside or outside our house, how far away they were when they laughed, and whether the window through which the sound passed was open or shut. The ability to derive information about the world from sound is a core aspect of perceptual intelligence. Auditory inferences are often complex and sophisticated despite their routine occurrence. The number of possible inferences is typically not enumerable, and the final interpretation is not merely one of selection from a fixed set. And yet humans perform such inferences effortlessly, based only on sounds captured using two sensors, our ears. Electronic devices can also "perceive" sound. Every phone and tablet has at least one microphone, as do most cameras. Any device or space can be equipped with microphones at minimal expense. Indeed, machines can not only "listen"; they have potential advantages over humans as listening devices, in that they can communicate and coordinate their experiences in ways that biological systems simply cannot. Collections of devices that can sense sound and communicate with each other could instantiate a single electronic entity that far surpasses humans in its ability to record and process information from sound. And yet machines at present cannot truly hear. Apart from well-developed efforts to recover structure in speech and music, the state of the art in machine hearing is limited to relatively impoverished descriptions of recorded sounds: detecting occurrences of a limited pre-specified set of sound types, and their locations. Although researchers typically envision artificially intelligent agents such as robots to have human-like hearing abilities, at present the rich descriptions and inferences humans can make about sound are entirely beyond the capability of machine systems. In this chapter, we suggest establishing the field of Computer Audition to develop the theory behind artificial systems that extract information from sound. Our objective is to enable computer systems to replicate and exceed human abilities. This chapter describes the challenges of this field.
机器人、人机交互、人工智能、视频检索、隐私、网络安全、物联网和大数据等领域有什么共同之处?它们都使用各种数据源:可视化的、文本的、时间戳的、链接的、记录的。但有一种数据来源几乎被学术界完全忽视了——声音。我们对世界的理解主要依赖于听觉——感知和解释我们听到的声音的能力。声音无处不在,是关于我们的环境和其中发生的事件的独特信息来源。只要听,我们就能确定孩子的笑声是来自家里还是外面,他们笑的时候有多远,声音穿过的窗户是开着的还是关着的。从声音中获取世界信息的能力是感知智能的一个核心方面。尽管听觉推理经常发生,但它们往往是复杂而复杂的。可能推论的数量通常是不可枚举的,而最终的解释也不仅仅是从一个固定集合中进行选择。然而,人类只根据耳朵这两个传感器捕捉到的声音,就能毫不费力地做出这样的推断。电子设备也能“感知”声音。每部手机和平板电脑都至少有一个麦克风,大多数相机也是如此。任何设备或空间都可以以最低的费用配备麦克风。的确,机器不仅能“倾听”;作为倾听设备,它们比人类有潜在的优势,因为它们可以以生物系统根本无法做到的方式交流和协调它们的经验。可以感知声音并相互交流的设备集合可以实例化一个电子实体,其记录和处理声音信息的能力远远超过人类。然而,目前的机器还不能真正听到声音。除了在恢复语音和音乐结构方面做出了充分的努力外,机器听力的最新技术还局限于对录制声音的相对贫乏的描述:检测有限的预先指定的一组声音类型的出现,以及它们的位置。虽然研究人员通常设想像机器人这样的人工智能代理具有类似人类的听觉能力,但目前人类对声音的丰富描述和推断完全超出了机器系统的能力。在本章中,我们建议建立计算机试听领域,以发展从声音中提取信息的人工系统背后的理论。我们的目标是使计算机系统能够复制并超越人类的能力。本章描述了该领域面临的挑战。
{"title":"Audition for multimedia computing","authors":"G. Friedland, P. Smaragdis, Josh H. McDermott, B. Raj","doi":"10.1145/3122865.3122868","DOIUrl":"https://doi.org/10.1145/3122865.3122868","url":null,"abstract":"What do the fields of robotics, human-computer interaction, AI, video retrieval, privacy, cybersecurity, Internet of Things, and big data all have in common? They all work with various sources of data: visual, textual, time stamps, links, records. But there is one source of data that has been almost completely ignored by the academic community---sound. \u0000 \u0000Our comprehension of the world relies critically on audition---the ability to perceive and interpret the sounds we hear. Sound is ubiquitous, and is a unique source of information about our environment and the events occurring in it. Just by listening, we can determine whether our child's laughter originated inside or outside our house, how far away they were when they laughed, and whether the window through which the sound passed was open or shut. The ability to derive information about the world from sound is a core aspect of perceptual intelligence. \u0000 \u0000Auditory inferences are often complex and sophisticated despite their routine occurrence. The number of possible inferences is typically not enumerable, and the final interpretation is not merely one of selection from a fixed set. And yet humans perform such inferences effortlessly, based only on sounds captured using two sensors, our ears. \u0000 \u0000Electronic devices can also \"perceive\" sound. Every phone and tablet has at least one microphone, as do most cameras. Any device or space can be equipped with microphones at minimal expense. Indeed, machines can not only \"listen\"; they have potential advantages over humans as listening devices, in that they can communicate and coordinate their experiences in ways that biological systems simply cannot. Collections of devices that can sense sound and communicate with each other could instantiate a single electronic entity that far surpasses humans in its ability to record and process information from sound. \u0000 \u0000And yet machines at present cannot truly hear. Apart from well-developed efforts to recover structure in speech and music, the state of the art in machine hearing is limited to relatively impoverished descriptions of recorded sounds: detecting occurrences of a limited pre-specified set of sound types, and their locations. Although researchers typically envision artificially intelligent agents such as robots to have human-like hearing abilities, at present the rich descriptions and inferences humans can make about sound are entirely beyond the capability of machine systems. \u0000 \u0000In this chapter, we suggest establishing the field of Computer Audition to develop the theory behind artificial systems that extract information from sound. Our objective is to enable computer systems to replicate and exceed human abilities. This chapter describes the challenges of this field.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient similarity search 高效相似度搜索
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122871
H. Jégou
This chapter addresses one of the fundamental problems involved in multimedia systems, namely efficient similarity search for large collections of multimedia content. This problem has received a lot of attention from various research communities. In particular, it is a historical line of research in computational geometry and databases. The computer vision and multimedia communities have adopted pragmatic approaches guided by practical requirements: the large sets of features required to describe image collections make visual search a highly demanding task. As a result, early works [Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997] in image indexing have foreseen the interest in approximate algorithms, especially after the dissemination of methods based on local description in the 90s, as any improvement obtained on this indexing part improves the whole visual search system. Among the existing approximate nearest neighbors (ANN) strategies, the popular framework of Locality-Sensitive Hashing (LSH) [Indyk and Motwani 1998, Gionis et al. 1999] provides theoretical guarantees on the search quality with limited assumptions on the underlying data distribution. It was first proposed [Indyk and Motwani 1998] for the Hamming and l1 spaces, and was later extended to the Euclidean/ cosine cases [Charikar 2002, Datar et al. 2004] or the earth mover's distance [Charikar 2002, Andoni and Indyk 2006]. LSH has been successfully used for local descriptors [Ke et al. 2004], 3D object indexing [Matei et al. 2006, Shakhnarovich et al. 2006], and other fields such as audio retrieval [Casey and Slaney 2007, Ryynanen and Klapuri 2008]. It has also received some attention in a context of private information retrieval [Pathak and Raj 2012, Aghasaryan et al. 2013, Furon et al. 2013]. A few years ago, approaches inspired by compression and more specifically quantization-based approaches [Jǵou et al. 2011] were shown to be a viable alternative to hashing methods, and shown successful for efficiently searching in a billion-sized dataset. This chapter discusses these different trends. It is organized as follows. Section 5.1 gives some background references and concepts, including evaluation issues. Most of the methods and variants are exposed within the LSH framework. It is worth mentioning that LSH is more of a concept than a particular algorithm. The search algorithms associated with LSH follow two distinct search mechanisms, the probe-cell model and sketches, which are discussed in Sections 5.2 and 5.3, respectively. Section 5.4 describes methods inspired by compression algorithms, while Section 5.5 discusses hybrid approaches combining the non-exhaustiveness of the cell-probe model with the advantages of sketches or compression-based algorithms. Other metrics than Euclidean and cosine are briefly discussed in Section 5.6.
本章讨论了多媒体系统中涉及的一个基本问题,即大型多媒体内容集合的高效相似度搜索。这个问题受到了各个研究团体的广泛关注。特别是,它是计算几何和数据库研究的历史路线。计算机视觉和多媒体社区采用了由实际需求指导的实用方法:描述图像集合所需的大量特征使视觉搜索成为一项要求很高的任务。因此,早期在图像索引方面的工作[Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997]已经预见到对近似算法的兴趣,特别是在90年代基于局部描述的方法传播之后,因为在该索引部分获得的任何改进都会改善整个视觉搜索系统。在现有的近似近邻(ANN)策略中,流行的位置敏感哈希(LSH)框架[Indyk and Motwani 1998, Gionis et al. 1999]通过对底层数据分布的有限假设,为搜索质量提供了理论上的保证。它首先被提出[Indyk和Motwani 1998]用于Hamming和l1空间,后来被扩展到欧几里得/余弦情况[Charikar 2002, Datar et al. 2004]或土方的距离[Charikar 2002, Andoni和Indyk 2006]。LSH已成功用于局部描述符[Ke et al. 2004], 3D对象索引[Matei et al. 2006, Shakhnarovich et al. 2006],以及其他领域,如音频检索[Casey and Slaney 2007, Ryynanen and Klapuri 2008]。在私人信息检索的背景下,它也受到了一些关注[Pathak和Raj 2012, Aghasaryan等人2013,Furon等人2013]。几年前,受压缩和更具体的基于量化的方法(Jǵou et al. 2011)启发的方法被证明是哈希方法的可行替代方案,并成功地在十亿大小的数据集中进行有效搜索。本章讨论这些不同的趋势。它的组织如下。第5.1节给出了一些背景参考和概念,包括评估问题。大多数方法和变体都在LSH框架中公开。值得一提的是,LSH更多的是一个概念,而不是一个特定的算法。与LSH相关的搜索算法遵循两种不同的搜索机制,探针单元模型和草图,分别在第5.2节和5.3节中讨论。第5.4节描述了受压缩算法启发的方法,而第5.5节讨论了混合方法,将细胞探针模型的非耗尽性与草图或基于压缩的算法的优势相结合。除欧几里得和余弦以外的其他度量将在第5.6节中简要讨论。
{"title":"Efficient similarity search","authors":"H. Jégou","doi":"10.1145/3122865.3122871","DOIUrl":"https://doi.org/10.1145/3122865.3122871","url":null,"abstract":"This chapter addresses one of the fundamental problems involved in multimedia systems, namely efficient similarity search for large collections of multimedia content. This problem has received a lot of attention from various research communities. In particular, it is a historical line of research in computational geometry and databases. The computer vision and multimedia communities have adopted pragmatic approaches guided by practical requirements: the large sets of features required to describe image collections make visual search a highly demanding task. As a result, early works [Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997] in image indexing have foreseen the interest in approximate algorithms, especially after the dissemination of methods based on local description in the 90s, as any improvement obtained on this indexing part improves the whole visual search system. \u0000 \u0000Among the existing approximate nearest neighbors (ANN) strategies, the popular framework of Locality-Sensitive Hashing (LSH) [Indyk and Motwani 1998, Gionis et al. 1999] provides theoretical guarantees on the search quality with limited assumptions on the underlying data distribution. It was first proposed [Indyk and Motwani 1998] for the Hamming and l1 spaces, and was later extended to the Euclidean/ cosine cases [Charikar 2002, Datar et al. 2004] or the earth mover's distance [Charikar 2002, Andoni and Indyk 2006]. LSH has been successfully used for local descriptors [Ke et al. 2004], 3D object indexing [Matei et al. 2006, Shakhnarovich et al. 2006], and other fields such as audio retrieval [Casey and Slaney 2007, Ryynanen and Klapuri 2008]. It has also received some attention in a context of private information retrieval [Pathak and Raj 2012, Aghasaryan et al. 2013, Furon et al. 2013]. \u0000 \u0000A few years ago, approaches inspired by compression and more specifically quantization-based approaches [Jǵou et al. 2011] were shown to be a viable alternative to hashing methods, and shown successful for efficiently searching in a billion-sized dataset. \u0000 \u0000This chapter discusses these different trends. It is organized as follows. Section 5.1 gives some background references and concepts, including evaluation issues. Most of the methods and variants are exposed within the LSH framework. It is worth mentioning that LSH is more of a concept than a particular algorithm. The search algorithms associated with LSH follow two distinct search mechanisms, the probe-cell model and sketches, which are discussed in Sections 5.2 and 5.3, respectively. Section 5.4 describes methods inspired by compression algorithms, while Section 5.5 discusses hybrid approaches combining the non-exhaustiveness of the cell-probe model with the advantages of sketches or compression-based algorithms. Other metrics than Euclidean and cosine are briefly discussed in Section 5.6.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114157877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Encrypted domain multimedia content analysis 加密域多媒体内容分析
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122870
P. Atrey, Ankita Lathey, M. A. Yakubu
{"title":"Encrypted domain multimedia content analysis","authors":"P. Atrey, Ankita Lathey, M. A. Yakubu","doi":"10.1145/3122865.3122870","DOIUrl":"https://doi.org/10.1145/3122865.3122870","url":null,"abstract":"","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115286311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hawkes processes for events in social media 霍克斯处理社交媒体上的事件
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122874
Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra
This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and key concepts in point processes. We then introduce the Hawkes process and its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data---we show how to model retweet cascades using a Hawkes self-exciting process.We present a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available in an online repository.
本章为点过程,特别是霍克斯过程,提供了一个易于理解的介绍,用于在连续时间内对离散的、相互依赖的事件进行建模。我们首先回顾点过程中的定义和关键概念。然后介绍了Hawkes过程及其事件强度函数,以及事件模拟和参数估计方案。我们还描述了一个从社交媒体数据中提取的实际例子——我们展示了如何使用霍克斯自激过程来建模转发级联。本文给出了一个内存内核的设计,并给出了参数估计和流行度预测的结果。代码和示例事件数据可在在线存储库中获得。
{"title":"Hawkes processes for events in social media","authors":"Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra","doi":"10.1145/3122865.3122874","DOIUrl":"https://doi.org/10.1145/3122865.3122874","url":null,"abstract":"This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and key concepts in point processes. We then introduce the Hawkes process and its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data---we show how to model retweet cascades using a Hawkes self-exciting process.We present a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available in an online repository.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116493447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Situation recognition using multimodal data 使用多模态数据的情况识别
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122873
Vivek K. Singh
{"title":"Situation recognition using multimodal data","authors":"Vivek K. Singh","doi":"10.1145/3122865.3122873","DOIUrl":"https://doi.org/10.1145/3122865.3122873","url":null,"abstract":"","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130320065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cloud gaming 云游戏
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122877
Kuan-Ta Chen, Wei Cai, R. Shea, Chun-Ying Huang, Jiangchuan Liu, Victor C. M. Leung, Cheng-Hsin Hsu
{"title":"Cloud gaming","authors":"Kuan-Ta Chen, Wei Cai, R. Shea, Chun-Ying Huang, Jiangchuan Liu, Victor C. M. Leung, Cheng-Hsin Hsu","doi":"10.1145/3122865.3122877","DOIUrl":"https://doi.org/10.1145/3122865.3122877","url":null,"abstract":"","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128808785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multimodal analysis of free-standing conversational groups 独立会话组的多模态分析
Pub Date : 2017-12-19 DOI: 10.1145/3122865.3122869
Xavier Alameda-Pineda, E. Ricci, N. Sebe
"Free-standing conversational groups" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural conversational units captured on camera poses many interesting challenges for the research community. First, although delineating these formations is strongly linked to other behavioral cues such as head and body poses, finding methods that successfully describe and exploit these links is not obvious. Second, the use of visual data is crucial, but when analyzing crowded scenes, one must account for occlusions and low-resolution images. In this regard, the use of other sensing technologies such as wearable devices can facilitate the analysis of social interactions by complementing the visual information. Yet the exploitation of multiple modalities poses other challenges in terms of data synchronization, calibration, and fusion. In this chapter, we discuss recent advances in multimodal social scene analysis, in particular for the detection of conversational groups or F-formations [Kendon 1990]. More precisely, a multimodal joint head and body pose estimator is described and compared to other recent approaches for head and body pose estimation and F-formation detection. Experimental results on the recently published SALSA dataset are reported, they evidence the long road toward a fully automated high-precision social scene analysis framework.
“独立对话群体”是我们所说的社会互动的基本组成部分,当人们站着聚集在一起时形成。自动检测、分析和跟踪摄像机捕捉到的这种结构会话单元,为研究界提出了许多有趣的挑战。首先,尽管描绘这些形态与其他行为线索(如头部和身体姿势)密切相关,但找到成功描述和利用这些联系的方法并不明显。其次,视觉数据的使用是至关重要的,但在分析拥挤的场景时,必须考虑到遮挡和低分辨率图像。在这方面,使用其他传感技术,如可穿戴设备,可以通过补充视觉信息来促进社会互动的分析。然而,多模态的利用在数据同步、校准和融合方面提出了其他挑战。在本章中,我们讨论了多模态社会场景分析的最新进展,特别是在会话组或f -formation的检测方面[Kendon 1990]。更准确地说,描述了一个多模态关节头部和身体姿势估计器,并与其他最近的头部和身体姿势估计和F-formation检测方法进行了比较。最近公布的SALSA数据集的实验结果表明,实现全自动高精度社会场景分析框架还有很长的路要走。
{"title":"Multimodal analysis of free-standing conversational groups","authors":"Xavier Alameda-Pineda, E. Ricci, N. Sebe","doi":"10.1145/3122865.3122869","DOIUrl":"https://doi.org/10.1145/3122865.3122869","url":null,"abstract":"\"Free-standing conversational groups\" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural conversational units captured on camera poses many interesting challenges for the research community. First, although delineating these formations is strongly linked to other behavioral cues such as head and body poses, finding methods that successfully describe and exploit these links is not obvious. Second, the use of visual data is crucial, but when analyzing crowded scenes, one must account for occlusions and low-resolution images. In this regard, the use of other sensing technologies such as wearable devices can facilitate the analysis of social interactions by complementing the visual information. Yet the exploitation of multiple modalities poses other challenges in terms of data synchronization, calibration, and fusion. In this chapter, we discuss recent advances in multimodal social scene analysis, in particular for the detection of conversational groups or F-formations [Kendon 1990]. More precisely, a multimodal joint head and body pose estimator is described and compared to other recent approaches for head and body pose estimation and F-formation detection. Experimental results on the recently published SALSA dataset are reported, they evidence the long road toward a fully automated high-precision social scene analysis framework.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127149919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Utilizing implicit user cues for multimedia analytics 利用隐含的用户线索进行多媒体分析
Pub Date : 1900-01-01 DOI: 10.1145/3122865.3122875
Subramanian Ramanathan, S. O. Gilani, N. Sebe
{"title":"Utilizing implicit user cues for multimedia analytics","authors":"Subramanian Ramanathan, S. O. Gilani, N. Sebe","doi":"10.1145/3122865.3122875","DOIUrl":"https://doi.org/10.1145/3122865.3122875","url":null,"abstract":"","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127271110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers of Multimedia Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1