ICMI-MLMI '10最新文献

英文中文

Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection 基于动态贝叶斯网络和基于图像的凝视检测的会话场景分析

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891969

Sebastian Gorga, K. Otsuka

This paper presents a probabilistic framework, which incorporates automatic image-based gaze detection, for inferring the structure of multiparty face-to-face conversations. This framework aims to infer conversation regimes and gaze patterns from the nonverbal behaviors of meeting participants, which are captured from image and audio streams with cameras and microphones. The conversation regime corresponds to a global conversational pattern such as monologue and dialogue, and the gaze pattern indicates "who is looking at whom". Input nonverbal behaviors include presence/absence of utterances, head directions, and discrete head-centered eye-gaze directions. In contrast to conventional meeting analysis methods that focus only on the participant's head pose as a surrogate of visual focus of attention, this paper newly incorporates vision-based gaze detection combined with head pose tracking into a probabilistic conversation model based on dynamic Bayesian network. Our gaze detector is able to differentiate 3 to 5 different eye gaze directions, e.g. left, straight and right. Experiments on four-person conversations confirm the power of the proposed framework in identifying conversation structure and in estimating gaze patterns with higher accuracy then previous models.

本文提出了一个概率框架，该框架结合了基于图像的自动凝视检测，用于推断多方面对面对话的结构。该框架旨在从会议参与者的非语言行为中推断对话机制和凝视模式，这些非语言行为是用摄像机和麦克风从图像和音频流中捕获的。对话机制对应于独白和对话等全球对话模式，凝视模式表示“谁在看谁”。输入的非语言行为包括说话的存在/缺失、头部方向和以头部为中心的离散眼睛注视方向。针对传统的会议分析方法只关注参与者的头部姿态作为视觉注意焦点的替代，本文将基于视觉的凝视检测与头部姿态跟踪相结合，引入到基于动态贝叶斯网络的概率会话模型中。我们的凝视检测器能够区分3到5种不同的眼睛凝视方向，例如左、直和右。对四人对话的实验证实了所提出的框架在识别对话结构和估计凝视模式方面的能力，并且比以前的模型具有更高的准确性。

{"title":"Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection","authors":"Sebastian Gorga, K. Otsuka","doi":"10.1145/1891903.1891969","DOIUrl":"https://doi.org/10.1145/1891903.1891969","url":null,"abstract":"This paper presents a probabilistic framework, which incorporates automatic image-based gaze detection, for inferring the structure of multiparty face-to-face conversations. This framework aims to infer conversation regimes and gaze patterns from the nonverbal behaviors of meeting participants, which are captured from image and audio streams with cameras and microphones. The conversation regime corresponds to a global conversational pattern such as monologue and dialogue, and the gaze pattern indicates \"who is looking at whom\". Input nonverbal behaviors include presence/absence of utterances, head directions, and discrete head-centered eye-gaze directions. In contrast to conventional meeting analysis methods that focus only on the participant's head pose as a surrogate of visual focus of attention, this paper newly incorporates vision-based gaze detection combined with head pose tracking into a probabilistic conversation model based on dynamic Bayesian network. Our gaze detector is able to differentiate 3 to 5 different eye gaze directions, e.g. left, straight and right. Experiments on four-person conversations confirm the power of the proposed framework in identifying conversation structure and in estimating gaze patterns with higher accuracy then previous models.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114206903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Employing social gaze and speaking activity for automatic determination of the Extraversion trait 利用社会凝视和说话活动自动确定外向性特征

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891913

B. Lepri, Subramanian Ramanathan, Kyriaki Kalimeri, Jacopo Staiano, F. Pianesi, N. Sebe

In order to predict the Extraversion personality trait, we exploit medium-grained behaviors enacted in group meetings, namely, speaking time and social attention (social gaze). The latter will be further distinguished in to attention given to the group members and attention received from them. The results of our work confirm many of our hypotheses: a) speaking time and (some forms of) social gaze are effective in automatically predicting Extraversion; b) classification accuracy is affected by the size of the time slices used for analysis, and c) to a large extent, the consideration of the social context does not add much to accuracy prediction, with an important exception concerning social gaze.

为了预测外向性人格特征，我们利用了群体会议中出现的中等粒度行为，即发言时间和社会注意力(社会注视)。后者将进一步区分对小组成员的关注和从他们那里得到的关注。我们的工作结果证实了我们的许多假设:a)说话时间和(某些形式的)社交凝视在自动预测外向性方面是有效的;B)分类精度受到用于分析的时间片大小的影响，c)在很大程度上，社会背景的考虑对精度预测没有多大帮助，但社会凝视是一个重要的例外。

引用次数: 39

Focusing computational visual attention in multi-modal human-robot interaction 多模态人机交互中计算视觉注意力的聚焦

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891912

Boris Schauerte, G. Fink

Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.

识别口头和非口头提及的对象是人机交互的一个重要方面。最重要的是，必须实现共同关注的焦点，从而实现自然的交互行为。在这篇文章中，我们介绍了一个基于显著性的模型，该模型反映了多模态引用行为如何影响视觉搜索，即在场景中找到特定对象的任务。因此，我们将从指向手势中获得的位置信息与从语言中获得的关于被指物体视觉外观的上下文知识结合起来。然后，可用的信息被整合到一个生物驱动的显著性模型中，形成了视觉搜索的基础。我们通过实验评估的结果证明了所提出方法的可行性。

引用次数: 51

Design and evaluation of a wearable remote social touch device 一种可穿戴式远程社交触摸设备的设计与评价

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891959

Rongrong Wang, Francis K. H. Quek, J. Teh, A. Cheok, Sep Riang Lai

Psychological and sociological studies have established the essential role that touch plays in interpersonal communication. However this channel is largely ignored in current telecommunication technologies. We design and implement a remote touch armband with an electric motor actuator. This is paired with a touch input device in the form of a force-sensor-embedded smart phone case. When the smart phone is squeezed, the paired armband will be activated to simulate a squeeze on the user's upper arm. A usability study is conducted with 22 participants to evaluate the device in terms of perceptibility. The results show that users can easily perceive touch at different force levels.

心理学和社会学研究已经确立了触摸在人际交往中所起的重要作用。然而，目前的通信技术在很大程度上忽略了这一信道。我们设计并实现了一种带有电动执行器的远程触摸臂带。这与一个嵌入力传感器的智能手机壳形式的触摸输入设备配对。当智能手机被挤压时，配对的臂带将被激活，以模拟挤压用户的上臂。一项可用性研究进行了22名参与者，以评估在感知方面的设备。结果表明，用户可以很容易地感知不同力度的触摸。

引用次数: 15

Activity-based Ubicomp: a new research basis for the future of human-computer interaction 基于活动的Ubicomp:未来人机交互的新研究基础

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891940

J. Landay

Ubiquitous computing (Ubicomp) is bringing computing off the desktop and into our everyday lives. For example, an interactive display can be used by the family of an elder to stay in constant touch with the elder's everyday wellbeing, or by a group to visualize and share information about exercise and fitness. Mobile sensors, networks, and displays are proliferating worldwide in mobile phones, enabling this new wave of applications that are intimate with the user's physical world. In addition to being ubiquitous, these applications share a focus on high-level activities, which are long-term social processes that take place in multiple environments and are supported by complex computation and inference of sensor data. However, the promise of this Activity-based Ubicomp is unfulfilled, primarily due to methodological, design, and tool limitations in how we understand the dynamics of activities. The traditional cognitive psychology basis for human-computer interaction, which focuses on our short term interactions with technological artifacts, is insufficient for achieving the promise of Activity-based Ubicomp. We are developing design methodologies and tools, as well as activity recognition technologies, to both demonstrate the potential of Activity-based Ubicomp as well as to support designers in fruitfully creating these types of applications.

普适计算(Ubicomp)正在将计算从桌面带入我们的日常生活。例如，老年人的家人可以使用交互式显示器来保持与老年人日常健康的持续联系，或者由一个团体来可视化和分享有关锻炼和健身的信息。移动传感器、网络和显示器在世界范围内的移动电话中激增，使这一新的应用浪潮与用户的物理世界密切相关。除了无处不在之外，这些应用程序还关注高级活动，这些活动是在多种环境中发生的长期社会过程，并由传感器数据的复杂计算和推理支持。然而，这种基于活动的Ubicomp的承诺并没有实现，主要是由于方法、设计和工具的限制，我们如何理解活动的动态。传统的人机交互认知心理学基础，关注我们与技术制品的短期交互，不足以实现基于活动的Ubicomp的承诺。我们正在开发设计方法和工具，以及活动识别技术，以展示基于活动的Ubicomp的潜力，并支持设计师富有成效地创建这些类型的应用程序。

{"title":"Activity-based Ubicomp: a new research basis for the future of human-computer interaction","authors":"J. Landay","doi":"10.1145/1891903.1891940","DOIUrl":"https://doi.org/10.1145/1891903.1891940","url":null,"abstract":"Ubiquitous computing (Ubicomp) is bringing computing off the desktop and into our everyday lives. For example, an interactive display can be used by the family of an elder to stay in constant touch with the elder's everyday wellbeing, or by a group to visualize and share information about exercise and fitness. Mobile sensors, networks, and displays are proliferating worldwide in mobile phones, enabling this new wave of applications that are intimate with the user's physical world. In addition to being ubiquitous, these applications share a focus on high-level activities, which are long-term social processes that take place in multiple environments and are supported by complex computation and inference of sensor data. However, the promise of this Activity-based Ubicomp is unfulfilled, primarily due to methodological, design, and tool limitations in how we understand the dynamics of activities. The traditional cognitive psychology basis for human-computer interaction, which focuses on our short term interactions with technological artifacts, is insufficient for achieving the promise of Activity-based Ubicomp. We are developing design methodologies and tools, as well as activity recognition technologies, to both demonstrate the potential of Activity-based Ubicomp as well as to support designers in fruitfully creating these types of applications.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134104072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model 利用非参数切换状态空间模型对协同发音动力学建模的视觉语音合成

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891942

S. Deena, Shaobo Hou, Aphrodite Galata

We present a novel approach to speech-driven facial animation using a non-parametric switching state space model based on Gaussian processes. The model is an extension of the shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus are jointly modelled using the proposed method. The switching states are found using variable length Markov models trained on labelled phonetic data. We also propose a synthesis technique that takes into account both previous and future phonetic context, thus accounting for coarticulatory effects in speech.

我们提出了一种基于高斯过程的非参数切换状态空间模型的语音驱动面部动画的新方法。该模型是对共享高斯过程动力学模型的扩展，增加了切换状态。利用该方法对说话头语料库中的音频和视觉数据进行了联合建模。使用标记语音数据训练的变长马尔可夫模型来发现切换状态。我们还提出了一种综合技术，该技术考虑了以前和未来的语音上下文，从而考虑了语音中的协同发音效应。

引用次数: 19

Understanding contextual factors in location-aware multimedia messaging 理解位置感知多媒体消息传递中的上下文因素

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891933

Abdallah El Ali, F. Nack, L. Hardman

Location-aware messages left by people can make visible some aspects of their everyday experiences at a location. To understand the contextual factors surrounding how users produce and consume location-aware multimedia messaging (LMM), we use an experience-centered framework that makes explicit the different aspects of an experience. Using this framework, we conducted an exploratory, diary study aimed at eliciting implications for the study and design of LMM systems. In an earlier pilot study, we found that subjects did not have enough time to fully capture their everyday experiences using an LMM prototype, which led us to conduct a longer study using a multimodal diary method. The diary study data (verified for reliability using a categorization task) provided a closer look at the different aspects (spatiotemporal, social, affective, and cognitive) of people's experience. From the data, we derive three main findings (predominant LMM domains and tasks, capturing experience vs. experience of capture, context-dependent personalization) to inform the study and design of future LMM systems.

人们留下的位置感知信息可以让人们看到他们在某个地点的日常经历的某些方面。为了理解围绕用户如何产生和消费位置感知多媒体消息传递(LMM)的上下文因素，我们使用了一个以体验为中心的框架，该框架明确了体验的不同方面。利用这个框架，我们进行了一项探索性的、日记式的研究，旨在为LMM系统的研究和设计提供启示。在早期的试点研究中，我们发现受试者没有足够的时间使用LMM原型来完全捕捉他们的日常经验，这导致我们使用多模态日记方法进行更长的研究。日记研究数据(通过分类任务验证可靠性)提供了对人们经历的不同方面(时空、社会、情感和认知)的更深入研究。从数据中，我们得出了三个主要发现(主要的LMM领域和任务，捕获经验vs.捕获经验，依赖于上下文的个性化)，为未来LMM系统的研究和设计提供信息。

引用次数: 5

Cloud mouse: a new way to interact with the cloud 云鼠标:一种与云交互的新方式

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891920

Chunhui Zhang, Min Wang, R. Harper

In this paper we present a novel input device and associated UI metaphors for Cloud computing. Cloud computing will give users access to huge amount of data in new forms as well as anywhere and anytime, with applications ranging from Web data mining to social networks. The motivation of this work is to provide users access to cloud computing by a new personal device and to make nearby displays a personal displayer. The key points of this device are direct-point operation, grasping UI and tangible feedback. A UI metaphor for cloud computing is also introduced.

在本文中，我们提出了一种新的输入设备和相关的云计算UI隐喻。云计算的应用范围从网络数据挖掘到社交网络，使用户可以随时随地以新的形式访问大量数据。这项工作的动机是通过一种新的个人设备为用户提供云计算的访问，并使附近的显示器成为个人显示器。该设备的关键点是直接点操作，可把握的UI和有形的反馈。还介绍了云计算的UI比喻。

引用次数: 5

Musical performance as multimodal communication: drummers, musical collaborators, and listeners 作为多模式交流的音乐表演:鼓手、音乐合作者和听众

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891922

R. Ashley

Musical performance provides an interesting domain for understanding and investigating multimodal communication. Although the primary modality of music is auditory, musicians make considerable use of the visual channel as well. This talk examines musical performance as multimodal, focusing on drumming in one style of popular music (funk or soul music). The way drummers interact with, and communicate with, their musical collaborators and with listeners are examined, in terms of the structure of different musical parts; processes of mutual coordination, entrainment, and turn-taking (complementarity) are highlighted. Both pre-determined (composed) and spontaneous (improvised) behaviors are considered. The way in which digital drumsets function as complexly structured human interfaces to sound synthesis systems is examined as well.

音乐表演为理解和研究多模式交流提供了一个有趣的领域。虽然音乐的主要方式是听觉，但音乐家也相当多地利用视觉渠道。这次演讲考察了多模态的音乐表演，侧重于流行音乐(放克或灵魂音乐)的一种风格的击鼓。从不同音乐部分的结构来看，鼓手与他们的音乐合作者和听众互动和交流的方式;强调相互协调、夹带和轮流(互补)的过程。预先确定的(组合)和自发的(即兴)行为都被考虑在内。其中，数字鼓功能作为复杂结构的人机界面声音合成系统的方式进行了检查。

引用次数: 0

Haptic numbers: three haptic representation models for numbers on a touch screen phone 触觉数字:触屏手机上数字的三种触觉表示模型

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891949

Toni Pakkanen, R. Raisamo, Katri Salminen, Veikko Surakka

Systematic research on haptic stimuli is needed to create viable haptic feeling for user interface elements. There has been a lot of research with haptic user interface prototypes, but much less with haptic stimulus design. In this study we compared three haptic representation models with two representation rates for the numbers used in the phone number keypad layout. Haptic representations for the numbers were derived from Arabic and Roman numbers, and from the Location of the number button in the layout grid. Using a Nokia 5800 Express Music phone participants entered phone numbers blindly in the phone. The speed, error rate, and subjective experiences were recorded. The results showed that the model had no effect to the measured performance, but subjective experiences were affected. The Arabic numbers with slower speed were preferred most. Thus, subjectively the performance was rated as better, even though objective measures showed no differences.

为了给用户界面元素创造可行的触感，需要对触觉刺激进行系统的研究。关于触觉用户界面原型的研究很多，但关于触觉刺激设计的研究却很少。在这项研究中，我们比较了三种具有两种表示率的触觉表示模型，用于电话号码键盘布局。数字的触觉表示来自阿拉伯数字和罗马数字，以及布局网格中数字按钮的位置。参与者使用诺基亚5800 Express Music手机盲目输入电话号码。记录速度、错误率和主观体验。结果表明，该模型对测量性能没有影响，但主观体验受到影响。速度较慢的阿拉伯数字是最受欢迎的。因此，主观上表现被评为更好，即使客观测量显示没有差异。

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ICMI-MLMI '10

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀