ICMI-MLMI '10最新文献

英文中文

3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios 三维用户视角，动态会议场景中基于体素的视觉焦点估计

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891966

M. Voit, R. Stiefelhagen

In this paper we present a new framework for the online estimation of people's visual focus of attention from their head poses in dynamic meeting scenarios. We describe a voxel based approach to reconstruct the scene composition from an observer's perspective, in order to integrate occlusion handling and visibility verification. The observer's perspective is thereby simulated with live head pose tracking over four far-field views from the room's upper corners. We integrate motion and speech activity as further scene observations in a Bayesian Surprise framework to model prior attractors of attention within the situation's context. As evaluations on a dedicated dataset with 10 meeting videos show, this allows us to predict a meeting participant's focus of attention correctly in up to 72.2% of all frames.

本文提出了一种基于动态会议场景中头部姿态在线估计人们视觉注意焦点的新框架。我们描述了一种基于体素的方法，从观察者的角度重建场景组成，以整合遮挡处理和可见性验证。因此，观察者的视角是模拟的，实时头部姿态跟踪从房间的上角四个远场视图。在贝叶斯惊喜框架中，我们将动作和言语活动作为进一步的场景观察整合在一起，以模拟情境背景下的注意力吸引因素。正如对包含10个会议视频的专用数据集的评估所显示的那样，这使我们能够在所有帧中正确预测会议参与者的注意力焦点，准确率高达72.2%。

引用次数: 18

Grounding spatial language for video search 基于空间语言的视频搜索

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891944

Stefanie Tellex, T. Kollar, George Shaw, N. Roy, D. Roy

The ability to find a video clip that matches a natural language description of an event would enable intuitive search of large databases of surveillance video. We present a mechanism for connecting a spatial language query to a video clip corresponding to the query. The system can retrieve video clips matching millions of potential queries that describe complex events in video such as "people walking from the hallway door, around the island, to the kitchen sink." By breaking down the query into a sequence of independent structured clauses and modeling the meaning of each component of the structure separately, we are able to improve on previous approaches to video retrieval by finding clips that match much longer and more complex queries using a rich set of spatial relations such as "down" and "past." We present a rigorous analysis of the system's performance, based on a large corpus of task-constrained language collected from fourteen subjects. Using this corpus, we show that the system effectively retrieves clips that match natural language descriptions: 58.3% were ranked in the top two of ten in a retrieval task. Furthermore, we show that spatial relations play an important role in the system's performance.

找到与事件的自然语言描述相匹配的视频片段的能力，将使对大型监控视频数据库的直观搜索成为可能。我们提出了一种将空间语言查询与查询对应的视频片段连接起来的机制。该系统可以检索视频片段，匹配数百万个描述视频中复杂事件的潜在查询，例如“人们从走廊门走出来，绕过岛屿，走到厨房水槽”。通过将查询分解为一系列独立的结构化子句，并分别对结构的每个组成部分的含义进行建模，我们能够通过使用一组丰富的空间关系(如“down”和“past”)找到匹配更长的更复杂查询的片段，从而改进以前的视频检索方法。我们基于从14个主题中收集的任务约束语言的大型语料库，对系统的性能进行了严格的分析。使用这个语料库，我们发现系统有效地检索到符合自然语言描述的片段:58.3%的片段在检索任务中排名前两名。此外，我们还证明了空间关系在系统性能中起着重要作用。

{"title":"Grounding spatial language for video search","authors":"Stefanie Tellex, T. Kollar, George Shaw, N. Roy, D. Roy","doi":"10.1145/1891903.1891944","DOIUrl":"https://doi.org/10.1145/1891903.1891944","url":null,"abstract":"The ability to find a video clip that matches a natural language description of an event would enable intuitive search of large databases of surveillance video. We present a mechanism for connecting a spatial language query to a video clip corresponding to the query. The system can retrieve video clips matching millions of potential queries that describe complex events in video such as \"people walking from the hallway door, around the island, to the kitchen sink.\" By breaking down the query into a sequence of independent structured clauses and modeling the meaning of each component of the structure separately, we are able to improve on previous approaches to video retrieval by finding clips that match much longer and more complex queries using a rich set of spatial relations such as \"down\" and \"past.\" We present a rigorous analysis of the system's performance, based on a large corpus of task-constrained language collected from fourteen subjects. Using this corpus, we show that the system effectively retrieves clips that match natural language descriptions: 58.3% were ranked in the top two of ten in a retrieval task. Furthermore, we show that spatial relations play an important role in the system's performance.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131165400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Recommendation from robots in a real-world retail shop 来自现实世界零售商店机器人的推荐

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891929

Koji Kamei, K. Shinozawa, Tetsushi Ikeda, A. Utsumi, T. Miyashita, N. Hagita

By applying network robot technologies, recommendation methods from E-Commerce are incorporated in a retail shop in the real world. We constructed an experimental shop environment where communication robots recommend specific items to the customers according to their purchasing behavior as observed by networked sensors. A recommendation scenario is implemented with three robots and investigated through an experiment. The results indicate that the participants stayed longer in front of the shelves when the communication robots tried to interact with them and were influenced to carry out similar purchasing behaviors as those observed earlier. Other results suggest that the probability of customers' zone transition can be used to anticipate their purchasing behavior.

通过应用网络机器人技术，将电子商务中的推荐方法融入到现实世界的零售商店中。我们构建了一个实验商店环境，通信机器人根据联网传感器观察到的顾客购买行为，向顾客推荐特定的商品。用三个机器人实现了一个推荐场景，并通过实验进行了研究。结果表明，当交流机器人试图与他们互动时，参与者在货架前停留的时间更长，并受到影响，做出了与之前观察到的类似的购买行为。其他结果表明，顾客区域转移的概率可以用来预测他们的购买行为。

引用次数: 42

Empathetic video experience through timely multimodal interaction 通过及时的多模态交互进行移情视频体验

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891948

Myunghee Lee, G. Kim

In this paper, we describe a video playing system, named "Empatheater," that is controlled by multimodal interaction. As the video is played, the user must interact and emulate predefined video "events" through multimodal guidance and whole body interaction (e.g. following the main character's motion or gestures). Without the timely interaction, the video stops. The system shows guidance information as how to properly react and continue the video playing. The purpose of such a system is to provide indirect experience (of the given video content) by eliciting the user to mimic and empathize with the main character. The user is given the illusion (suspended disbelief) of playing an active role in the unraveling video content. We discuss various features of the newly proposed interactive medium. In addition, we report on the results of the pilot study that was carried out to evaluate its user experience compared to passive video viewing and keyboard based video control.

在本文中，我们描述了一个由多模态交互控制的视频播放系统，名为“移情剧场”。当视频播放时，用户必须通过多模态引导和全身交互(例如跟随主角的动作或手势)来交互和模拟预定义的视频“事件”。如果没有及时的互动，视频就会停止。该系统显示了如何正确反应和继续播放视频的指导信息。这种系统的目的是通过引导用户模仿并同情主要角色来提供(给定视频内容的)间接体验。用户被给予一种幻觉(暂停怀疑)，在解开视频内容的过程中扮演着积极的角色。我们讨论了新提出的交互式媒体的各种特征。此外，我们报告了试点研究的结果，该研究是为了评估其用户体验，将其与被动视频观看和基于键盘的视频控制进行比较。

引用次数: 3

Evidence-based automated traffic hazard zone mapping using wearable sensors 使用可穿戴传感器的基于证据的自动交通危险区域地图

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891957

Masahiro Tada, H. Noma, K. Renge

Recently, underestimating traffic condition risk is considered one of the biggest reasons for traffic accidents. In this paper, we proposed evidence-based automatic hazard zone mapping method using wearable sensors. Here, we measure driver's behavior using three-axis gyro sensors. Analyzing the measured motion data, proposed method can label characteristic motion that is observed at hazard zone. We gathered motion data sets form two types of driver, i.e., an instructor of driving school and an ordinary driver, then, tried to generate traffic hazard zone map focused on difference of the motions. Through the experiment in public road, we confirmed our method allows to extract hazard zone.

近年来，低估交通状况风险被认为是造成交通事故的最大原因之一。本文提出了基于可穿戴传感器的基于证据的自动危险区测绘方法。在这里，我们使用三轴陀螺仪传感器测量驾驶员的行为。通过对实测运动数据的分析，该方法可以对危险区域观测到的特征运动进行标记。我们收集驾校教师和普通驾驶员两类驾驶员的运动数据集，然后尝试生成针对运动差异的交通危险区地图。通过在公共道路上的实验，我们证实了我们的方法可以提取危险区。

引用次数: 2

A multimodal interactive text generation system 一个多模态交互文本生成系统

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891918

Luis Rodríguez, I. García-Varea, Alejandro Revuelta-Martínez, E. Vidal

We present an interactive text generation system aimed at providing assistance for text typing in different environments. This system works by predicting what the user is going to type based on the text he or she typed previously. A multimodal interface is included, intended to facilitate the text generation in constrained environments. The prototype is designed following a modular client-server architecture to provide a high flexibility.

我们提出了一个交互式文本生成系统，旨在为不同环境下的文本输入提供帮助。这个系统的工作原理是根据用户之前输入的文本来预测他或她将要输入的内容。包括一个多模态界面，旨在促进约束环境中的文本生成。原型是按照模块化客户机-服务器体系结构设计的，以提供高度的灵活性。

引用次数: 1

Cognitive skills learning: pen input patterns in computer-based athlete training 认知技能学习:基于计算机的运动员训练中的笔输入模式

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891955

Natalie Ruiz, Qian Qian Feng, R. Taib, Tara Handke, Fang Chen

In this paper, we describe a longitudinal user study with athletes using a cognitive training tool, equipped with an interactive pen interface, and think-aloud protocols. The aim is to verify whether cognitive load can be inferred directly from changes in geometric and temporal features of the pen trajectories. We compare trajectories across cognitive load levels and overall Pre and Post training tests. The results show trajectory durations and lengths decrease while speeds increase, all significantly, as cognitive load increases. These changes are attributed to mechanisms for dealing with high cognitive load in working memory, with minimal rehearsal. With more expertise, trajectory durations further decrease and speeds further increase, which is attributed in part to cognitive skill acquisition and to schema development, both in extraneous and intrinsic networks, between Pre and Post tests. As such, these pen trajectory features offer insight into implicit communicative changes related to load fluctuations.

在本文中，我们描述了一项纵向用户研究，运动员使用认知训练工具，配备了一个交互式笔界面，并思考出声协议。目的是验证是否可以从笔轨迹的几何和时间特征的变化直接推断认知负荷。我们比较了认知负荷水平和整体训练前和训练后测试的轨迹。结果显示，随着认知负荷的增加，运动轨迹的持续时间和长度会减少，而速度会增加。这些变化归因于处理工作记忆中高认知负荷的机制，而排练最少。随着专业知识的增加，轨迹持续时间进一步减少，速度进一步增加，这在一定程度上归因于认知技能的获得和图式的发展，在外部和内在网络中，在测试前后。因此，这些笔轨迹特征提供了与负载波动相关的隐式通信变化的见解。

引用次数: 11

A language-based approach to indexing heterogeneous multimedia lifelog 基于语言的异构多媒体生活日志索引方法

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891937

Peng-Wen Chen, Snehal Kumar Chennuru, S. Buthpitiya, Y. Zhang

Lifelog systems, inspired by Vannevar Bush's concept of "MEMory EXtenders" (MEMEX), are capable of storing a person's lifetime experience as a multimedia database. Despite such systems' huge potential for improving people's everyday life, there are major challenges that need to be addressed to make such systems practical. One of them is how to index the inherently large and heterogeneous lifelog data so that a person can efficiently retrieve the log segments that are of interest. In this paper, we present a novel approach to indexing lifelogs using activity language. By quantizing the heterogeneous high dimensional sensory data into text representation, we are able to apply statistical natural language processing techniques to index, recognize, segment, cluster, retrieve, and infer high-level semantic meanings of the collected lifelogs. Based on this indexing approach, our lifelog system supports easy retrieval of log segments representing past similar activities and generation of salient summaries serving as overviews of segments.

受Vannevar Bush的“记忆扩展器”(MEMEX)概念的启发，生命日志系统能够以多媒体数据库的形式存储一个人的一生经历。尽管这些系统在改善人们的日常生活方面具有巨大潜力，但要使这些系统切实可行，还需要解决一些重大挑战。其中之一是如何索引固有的大型异构生命日志数据，以便人们可以有效地检索感兴趣的日志段。在本文中，我们提出了一种使用活动语言来索引生活日志的新方法。通过将异构的高维感官数据量化为文本表示，我们能够应用统计自然语言处理技术对收集到的生命日志进行索引、识别、分割、聚类、检索和推断高级语义。基于这种索引方法，我们的生活日志系统支持轻松检索代表过去类似活动的日志片段，并生成作为片段概述的突出摘要。

引用次数: 8

Key-press gestures recognition and interaction based on SEMG signals 基于表面肌电信号的按键手势识别与交互

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891950

Juan Cheng, Xiang Chen, Zhiyuan Lu, Kongqiao Wang, M. Shen

This article conducted research on the pattern recognition of keypress finger gestures based on surface electromyographic (SEMG) signals and the feasibility of key -press gestures for interaction application. Two sort of recognition experiments were designed firstly to explore the feasibility and repeatability of the SEMG -based classification of 1 6 key-press finger gestures relating to right hand and 4 control gestures, and the key -press gestures were defined referring to the standard PC key board. Based on the experimental results, 10 quite well recognized key -press gestures were selected as numeric input keys of a simulated phone, and the 4 control gestures were mapped to 4 control keys. Then two types of use tests, namely volume setting and SMS sending were conducted to survey the gesture-base interaction performance and user's attitude to this technique, and the test results showed that users could accept this novel input strategy with fresh experience.

本文研究了基于表面肌电图(surface electromyographic, SEMG)信号的按键手指手势模式识别以及按键手势交互应用的可行性。首先设计了两类识别实验，探讨了基于表面肌电信号分类的16种与右手相关的手指按键手势和4种控制手势的可行性和可重复性，并参照标准PC键盘定义了按键手势。根据实验结果，选择10个识别度较高的按键手势作为模拟手机的数字输入键，并将4个控制手势映射到4个控制键上。然后进行了音量设置和短信发送两种类型的使用测试，调查了基于手势的交互性能和用户对该技术的态度，测试结果表明用户能够以全新的体验接受这种新颖的输入策略。

引用次数: 13

Discovering eye gaze behavior during human-agent conversation in an interactive storytelling application 在交互式故事叙述应用程序中发现人-代理对话中的眼睛注视行为

ICMI-MLMI '10

Pub Date : 2010-11-08 DOI: 10.1145/1891903.1891915

Nikolaus Bee, J. Wagner, E. André, Thurid Vogt, Fred Charles, D. Pizzi, M. Cavazza

In this paper, we investigate the user's eye gaze behavior during the conversation with an interactive storytelling application. We present an interactive eye gaze model for embodied conversational agents in order to improve the experience of users participating in Interactive Storytelling. The underlying narrative in which the approach was tested is based on a classical XIXth century psychological novel: Madame Bovary, by Flaubert. At various stages of the narrative, the user can address the main character or respond to her using free-style spoken natural language input, impersonating her lover. An eye tracker was connected to enable the interactive gaze model to respond to user's current gaze (i.e. looking into the virtual character's eyes or not). We conducted a study with 19 students where we compared our interactive eye gaze model with a non-interactive eye gaze model that was informed by studies of human gaze behaviors, but had no information on where the user was looking. The interactive model achieved a higher score for user ratings than the non-interactive model. In addition we analyzed the users' gaze behavior during the conversation with the virtual character.

在这篇论文中，我们用一个交互式故事叙述应用程序来研究用户在对话过程中的眼睛注视行为。为了提高用户参与交互式故事叙述的体验，我们提出了一种针对具身会话代理的交互式眼睛注视模型。测试这种方法的基本叙事是基于19世纪经典的心理小说:福楼拜的《包法利夫人》。在叙述的不同阶段，用户可以使用自由风格的自然语言输入来称呼主角或回应她，模仿她的爱人。连接眼动仪，使交互式凝视模型能够响应用户当前的凝视(即是否直视虚拟角色的眼睛)。我们对19名学生进行了一项研究，将我们的交互式眼睛注视模型与非交互式眼睛注视模型进行了比较。非交互式眼睛注视模型是根据人类注视行为的研究得出的，但没有关于用户在看哪里的信息。交互模型比非交互模型获得了更高的用户评分。此外，我们还分析了用户在与虚拟角色对话时的凝视行为。

{"title":"Discovering eye gaze behavior during human-agent conversation in an interactive storytelling application","authors":"Nikolaus Bee, J. Wagner, E. André, Thurid Vogt, Fred Charles, D. Pizzi, M. Cavazza","doi":"10.1145/1891903.1891915","DOIUrl":"https://doi.org/10.1145/1891903.1891915","url":null,"abstract":"In this paper, we investigate the user's eye gaze behavior during the conversation with an interactive storytelling application. We present an interactive eye gaze model for embodied conversational agents in order to improve the experience of users participating in Interactive Storytelling. The underlying narrative in which the approach was tested is based on a classical XIXth century psychological novel: Madame Bovary, by Flaubert. At various stages of the narrative, the user can address the main character or respond to her using free-style spoken natural language input, impersonating her lover. An eye tracker was connected to enable the interactive gaze model to respond to user's current gaze (i.e. looking into the virtual character's eyes or not). We conducted a study with 19 students where we compared our interactive eye gaze model with a non-interactive eye gaze model that was informed by studies of human gaze behaviors, but had no information on where the user was looking. The interactive model achieved a higher score for user ratings than the non-interactive model. In addition we analyzed the users' gaze behavior during the conversation with the virtual character.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122054781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ICMI-MLMI '10

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀