Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文中文

Automatic Detection of Mind Wandering During Reading Using Gaze and Physiology 用凝视和生理学自动检测阅读时的走神

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820742

R. Bixler, Nathaniel Blanchard, L. Garrison, S. D’Mello

Mind wandering (MW) entails an involuntary shift in attention from task-related thoughts to task-unrelated thoughts, and has been shown to have detrimental effects on performance in a number of contexts. This paper proposes an automated multimodal detector of MW using eye gaze and physiology (skin conductance and skin temperature) and aspects of the context (e.g., time on task, task difficulty). Data in the form of eye gaze and physiological signals were collected as 178 participants read four instructional texts from a computer interface. Participants periodically provided self-reports of MW in response to pseudorandom auditory probes during reading. Supervised machine learning models trained on features extracted from participants' gaze fixations, physiological signals, and contextual cues were used to detect pages where participants provided positive responses of MW to the auditory probes. Two methods of combining gaze and physiology features were explored. Feature level fusion entailed building a single model by combining feature vectors from individual modalities. Decision level fusion entailed building individual models for each modality and adjudicating amongst individual decisions. Feature level fusion resulted in an 11% improvement in classification accuracy over the best unimodal model, but there was no comparable improvement for decision level fusion. This was reflected by a small improvement in both precision and recall. An analysis of the features indicated that MW was associated with fewer and longer fixations and saccades, and a higher more deterministic skin temperature. Possible applications of the detector are discussed.

走神(MW)是指注意力从与任务相关的想法无意识地转移到与任务无关的想法，并已被证明在许多情况下对表现有不利影响。本文提出了一种基于眼睛注视和生理(皮肤电导和皮肤温度)以及情境方面(例如任务时间、任务难度)的自动多模态检测器。研究人员收集了178名参与者在电脑界面上阅读四篇教学文本时眼睛注视和生理信号的数据。在阅读过程中，参与者定期提供对伪随机听觉探针的自我报告。从参与者的注视、生理信号和上下文线索中提取特征，训练有监督的机器学习模型，用于检测参与者对听觉探针提供积极响应的页面。探索了两种将凝视与生理特征相结合的方法。特征级融合需要通过组合来自各个模态的特征向量来构建单个模型。决策级融合需要为每种模式建立单独的模型，并在各个决策之间进行裁决。特征级融合导致分类精度比最佳单峰模型提高11%，但决策级融合没有可比的提高。这反映在精确度和召回率的小幅提高上。一项特征分析表明，MW与更少和更长时间的注视和扫视以及更高的更具确定性的皮肤温度有关。讨论了该探测器的可能应用。

{"title":"Automatic Detection of Mind Wandering During Reading Using Gaze and Physiology","authors":"R. Bixler, Nathaniel Blanchard, L. Garrison, S. D’Mello","doi":"10.1145/2818346.2820742","DOIUrl":"https://doi.org/10.1145/2818346.2820742","url":null,"abstract":"Mind wandering (MW) entails an involuntary shift in attention from task-related thoughts to task-unrelated thoughts, and has been shown to have detrimental effects on performance in a number of contexts. This paper proposes an automated multimodal detector of MW using eye gaze and physiology (skin conductance and skin temperature) and aspects of the context (e.g., time on task, task difficulty). Data in the form of eye gaze and physiological signals were collected as 178 participants read four instructional texts from a computer interface. Participants periodically provided self-reports of MW in response to pseudorandom auditory probes during reading. Supervised machine learning models trained on features extracted from participants' gaze fixations, physiological signals, and contextual cues were used to detect pages where participants provided positive responses of MW to the auditory probes. Two methods of combining gaze and physiology features were explored. Feature level fusion entailed building a single model by combining feature vectors from individual modalities. Decision level fusion entailed building individual models for each modality and adjudicating amongst individual decisions. Feature level fusion resulted in an 11% improvement in classification accuracy over the best unimodal model, but there was no comparable improvement for decision level fusion. This was reflected by a small improvement in both precision and recall. An analysis of the features indicated that MW was associated with fewer and longer fixations and saccades, and a higher more deterministic skin temperature. Possible applications of the detector are discussed.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76362634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Multimodal Affect Detection in the Wild: Accuracy, Availability, and Generalizability 野外多模态情感检测:准确性、可用性和可泛化性

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823316

Nigel Bosch

Affect detection is an important component of computerized learning environments that adapt the interface and materials to students' affect. This paper proposes a plan for developing and testing multimodal affect detectors that generalize across differences in data that are likely to occur in practical applications (e.g., time, demographic variables). Facial features and interaction log features are considered as modalities for affect detection in this scenario, each with their own advantages. Results are presented for completed work evaluating the accuracy of individual modality face- and interaction- based detectors, accuracy and availability of a multimodal combination of these modalities, and initial steps toward generalization of face-based detectors. Additional data collection needed for cross-culture generalization testing is also completed. Challenges and possible solutions for proposed cross-cultural generalization testing of multimodal detectors are also discussed.

情感检测是计算机化学习环境的重要组成部分，它使界面和材料适应学生的情感。本文提出了一个开发和测试多模态影响检测器的计划，该检测器可以对实际应用中可能出现的数据差异进行概括(例如，时间，人口变量)。在这种情况下，面部特征和交互日志特征被认为是影响检测的模式，每个都有自己的优势。结果表明，完成的工作评估了基于人脸和交互的单个模态检测器的准确性，这些模态的多模态组合的准确性和可用性，以及基于人脸的检测器的推广的初步步骤。还完成了跨文化泛化测试所需的额外数据收集。本文还讨论了多模态检测器跨文化泛化测试的挑战和可能的解决方案。

引用次数: 9

Who's Speaking?: Audio-Supervised Classification of Active Speakers in Video 说话的是谁?视频中主动说话者的音频监督分类

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820780

Punarjay Chakravarty, S. Mirzaei, T. Tuytelaars, H. V. hamme

Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features.

传统上，在视频中通过检测主动说话者的嘴唇来识别他们。本文用时空特征证明了这一点，这些特征旨在捕捉其他线索:主动说话者的头部、上身和手的运动。通过麦克风阵列的声源定位获得的说话人方向信息用于监督这些视频特征的训练。

引用次数: 35

Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding 利用移动应用程序的行为模式实现个性化口语理解

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820781

Yun-Nung (Vivian) Chen, Ming Sun, Alexander I. Rudnicky, A. Gershman

Spoken language interfaces are appearing in various smart devices (e.g. smart-phones, smart-TV, in-car navigating systems) and serve as intelligent assistants (IAs). However, most of them do not consider individual users' behavioral profiles and contexts when modeling user intents. Such behavioral patterns are user-specific and provide useful cues to improve spoken language understanding (SLU). This paper focuses on leveraging the app behavior history to improve spoken dialog systems performance. We developed a matrix factorization approach that models speech and app usage patterns to predict user intents (e.g. launching a specific app). We collected multi-turn interactions in a WoZ scenario; users were asked to reproduce the multi-app tasks that they had performed earlier on their smart-phones. By modeling latent semantics behind lexical and behavioral patterns, the proposed multi-model system achieves about 52% of turn accuracy for intent prediction on ASR transcripts.

语音接口出现在各种智能设备(如智能手机、智能电视、车载导航系统)中，并作为智能助手(IAs)。然而，在对用户意图建模时，它们中的大多数都没有考虑单个用户的行为概况和上下文。这种行为模式是用户特有的，为提高口语理解(SLU)提供了有用的线索。本文的重点是利用应用程序的行为历史来提高口语对话系统的性能。我们开发了一种矩阵分解方法，对语音和应用程序使用模式进行建模，以预测用户的意图(例如启动特定的应用程序)。我们收集了《WoZ》场景中的多回合互动;用户被要求重现他们之前在智能手机上执行的多应用程序任务。通过对词汇和行为模式背后的潜在语义进行建模，所提出的多模型系统在ASR转录本上的意图预测准确率达到52%左右。

引用次数: 32

Recurrent Neural Networks for Emotion Recognition in Video 视频中情感识别的递归神经网络

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830596

S. Kahou, Vincent Michalski, K. Konda, R. Memisevic, C. Pal

Deep learning based approaches to facial analysis and video analysis have recently demonstrated high performance on a variety of key tasks such as face recognition, emotion recognition and activity recognition. In the case of video, information often must be aggregated across a variable length sequence of frames to produce a classification result. Prior work using convolutional neural networks (CNNs) for emotion recognition in video has relied on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial aggregation of information. Recurrent neural networks (RNNs) have seen an explosion of recent interest as they yield state-of-the-art performance on a variety of sequence analysis tasks. RNNs provide an attractive framework for propagating information over a sequence using a continuous valued hidden layer representation. In this work we present a complete system for the 2015 Emotion Recognition in the Wild (EmotiW) Challenge. We focus our presentation and experimental analysis on a hybrid CNN-RNN architecture for facial expression analysis that can outperform a previously applied CNN approach using temporal averaging for aggregation.

基于深度学习的面部分析和视频分析方法最近在面部识别、情绪识别和活动识别等各种关键任务上表现出了高性能。在视频的情况下，信息通常必须在可变长度的帧序列中聚合，以产生分类结果。先前使用卷积神经网络(cnn)进行视频情感识别的工作依赖于时间平均和池化操作，这让人想起广泛使用的信息空间聚合方法。递归神经网络(RNNs)在各种序列分析任务中产生了最先进的性能，最近引起了人们的兴趣。rnn为使用连续值隐藏层表示在序列上传播信息提供了一个有吸引力的框架。在这项工作中，我们为2015年野外情绪识别(EmotiW)挑战赛提供了一个完整的系统。我们的演示和实验分析集中在用于面部表情分析的混合CNN- rnn架构上，该架构可以优于先前使用时间平均进行聚合的CNN方法。

{"title":"Recurrent Neural Networks for Emotion Recognition in Video","authors":"S. Kahou, Vincent Michalski, K. Konda, R. Memisevic, C. Pal","doi":"10.1145/2818346.2830596","DOIUrl":"https://doi.org/10.1145/2818346.2830596","url":null,"abstract":"Deep learning based approaches to facial analysis and video analysis have recently demonstrated high performance on a variety of key tasks such as face recognition, emotion recognition and activity recognition. In the case of video, information often must be aggregated across a variable length sequence of frames to produce a classification result. Prior work using convolutional neural networks (CNNs) for emotion recognition in video has relied on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial aggregation of information. Recurrent neural networks (RNNs) have seen an explosion of recent interest as they yield state-of-the-art performance on a variety of sequence analysis tasks. RNNs provide an attractive framework for propagating information over a sequence using a continuous valued hidden layer representation. In this work we present a complete system for the 2015 Emotion Recognition in the Wild (EmotiW) Challenge. We focus our presentation and experimental analysis on a hybrid CNN-RNN architecture for facial expression analysis that can outperform a previously applied CNN approach using temporal averaging for aggregation.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74090351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 321

Detecting Mastication: A Wearable Approach 检测咀嚼:一种可穿戴的方法

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820767

Abdelkareem Bedri, Apoorva Verlekar, Edison Thomaz, Valerie Avva, Thad Starner

We explore using the Outer Ear Interface (OEI) to recognize eating activities. OEI contains a 3D gyroscope and a set of proximity sensors encapsulated in an off-the-shelf earpiece to monitor jaw movement by measuring ear canal deformation. In a laboratory setting with 20 participants, OEI could distinguish eating from other activities, such as walking, talking, and silently reading, with over 90% accuracy (user independent). In a second study, six subjects wore the system for 6 hours each while performing their normal daily activities. OEI correctly classified five minute segments of time as eating or non-eating with 93% accuracy (user dependent).

我们探索使用外耳接口(OEI)来识别进食活动。OEI包含一个3D陀螺仪和一组封装在现成耳机中的接近传感器，通过测量耳道变形来监测下颌运动。在20名参与者的实验室环境中，OEI可以将进食与其他活动(如走路、说话和默读)区分开来，准确率超过90%(独立于用户)。在第二项研究中，六名受试者在进行日常活动时每人佩戴该系统6小时。OEI正确地将5分钟的时间划分为吃或不吃，准确率为93%(取决于用户)。

引用次数: 54

Implicit Human-computer Interaction: Two Complementary Approaches 隐式人机交互:两种互补的方法

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823311

Julia Wache

One of the main goals in Human Computer Interaction (HCI) is improving the interface between users and computers: Interfacing should be intuitive, effortless and easy to learn. We approach the goal from two opposite but complementary directions: On the one hand, computer-user interaction can be enhanced if the computer can assess users differences in an automated manner. Therefore we collected physiological and psychological data from people exposed to emotional stimuli and created a database for the community to use for further research in the context of automated learning to detect the differences in the inner states of users. We employed the data both to not only predict the emotional state of users but also their personality traits. On the other hand, users need information dispatched by a computer to be easily, intuitively accessible. To minimize the cognitive effort of assimilating information we use a tactile device in form of a belt and test how it can be best used to replace or augment the information received from other senses (e.g., visual and auditory) in a navigation task. We investigate how both approaches can be combined to improve specific applications.

人机交互(HCI)的主要目标之一是改善用户和计算机之间的界面:界面应该是直观的，不费力的，易于学习的。我们从两个相反但互补的方向来实现这一目标:一方面，如果计算机能够以自动化的方式评估用户差异，则可以增强计算机与用户的交互。因此，我们收集了暴露于情绪刺激的人的生理和心理数据，并创建了一个数据库，供社区在自动学习的背景下进行进一步的研究，以检测用户内心状态的差异。我们利用这些数据不仅可以预测用户的情绪状态，还可以预测他们的个性特征。另一方面，用户需要计算机发送的信息容易、直观地访问。为了尽量减少吸收信息的认知努力，我们使用皮带形式的触觉设备，并测试如何在导航任务中最好地使用它来替代或增强从其他感官(例如视觉和听觉)接收的信息。我们将研究如何结合这两种方法来改进特定的应用程序。

{"title":"Implicit Human-computer Interaction: Two Complementary Approaches","authors":"Julia Wache","doi":"10.1145/2818346.2823311","DOIUrl":"https://doi.org/10.1145/2818346.2823311","url":null,"abstract":"One of the main goals in Human Computer Interaction (HCI) is improving the interface between users and computers: Interfacing should be intuitive, effortless and easy to learn. We approach the goal from two opposite but complementary directions: On the one hand, computer-user interaction can be enhanced if the computer can assess users differences in an automated manner. Therefore we collected physiological and psychological data from people exposed to emotional stimuli and created a database for the community to use for further research in the context of automated learning to detect the differences in the inner states of users. We employed the data both to not only predict the emotional state of users but also their personality traits. On the other hand, users need information dispatched by a computer to be easily, intuitively accessible. To minimize the cognitive effort of assimilating information we use a tactile device in form of a belt and test how it can be best used to replace or augment the information received from other senses (e.g., visual and auditory) in a navigation task. We investigate how both approaches can be combined to improve specific applications.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78006128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Classification of Children's Social Dominance in Group Interactions with Robots 儿童与机器人群体互动的社会优势分类

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820735

Sarah Strohkorb, Iolanda Leite, Natalie Warren, B. Scassellati

As social robots become more widespread in educational environments, their ability to understand group dynamics and engage multiple children in social interactions is crucial. Social dominance is a highly influential factor in social interactions, expressed through both verbal and nonverbal behaviors. In this paper, we present a method for determining whether a participant is high or low in social dominance in a group interaction with children and robots. We investigated the correlation between many verbal and nonverbal behavioral features with social dominance levels collected through teacher surveys. We additionally implemented Logistic Regression and Support Vector Machines models with classification accuracies of 81% and 89%, respectively, showing that using a small subset of nonverbal behavioral features, these models can successfully classify children's social dominance level. Our approach for classifying social dominance is novel not only for its application to children, but also for achieving high classification accuracies using a reduced set of nonverbal features that, in future work, can be automatically extracted with current sensing technology.

随着社交机器人在教育环境中变得越来越普遍，它们理解群体动态和让多个孩子参与社交互动的能力至关重要。社会支配在社会交往中是一个极具影响力的因素，它通过言语和非言语行为表现出来。在本文中，我们提出了一种方法来确定参与者在与儿童和机器人的群体互动中是高还是低的社会支配地位。通过对教师的调查，我们研究了许多语言和非语言行为特征与社会支配水平之间的关系。此外，采用Logistic回归和支持向量机模型，分类准确率分别达到81%和89%，结果表明，使用一小部分非语言行为特征，这些模型可以成功地分类儿童的社会优势水平。我们对社会支配地位进行分类的方法是新颖的，不仅因为它适用于儿童，而且还因为使用一组减少的非语言特征来实现高分类精度，在未来的工作中，这些非语言特征可以用当前的传感技术自动提取。

{"title":"Classification of Children's Social Dominance in Group Interactions with Robots","authors":"Sarah Strohkorb, Iolanda Leite, Natalie Warren, B. Scassellati","doi":"10.1145/2818346.2820735","DOIUrl":"https://doi.org/10.1145/2818346.2820735","url":null,"abstract":"As social robots become more widespread in educational environments, their ability to understand group dynamics and engage multiple children in social interactions is crucial. Social dominance is a highly influential factor in social interactions, expressed through both verbal and nonverbal behaviors. In this paper, we present a method for determining whether a participant is high or low in social dominance in a group interaction with children and robots. We investigated the correlation between many verbal and nonverbal behavioral features with social dominance levels collected through teacher surveys. We additionally implemented Logistic Regression and Support Vector Machines models with classification accuracies of 81% and 89%, respectively, showing that using a small subset of nonverbal behavioral features, these models can successfully classify children's social dominance level. Our approach for classifying social dominance is novel not only for its application to children, but also for achieving high classification accuracies using a reduced set of nonverbal features that, in future work, can be automatically extracted with current sensing technology.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"32 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91488971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Exploring Intent-driven Multimodal Interface for Geographical Information System 探索意图驱动的多模式地理信息系统接口

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823304

Feng Sun

Geographic Information Systems (GIS) offers a large amount of functions for performing spatial analysis and geospatial information retrieval. However, off-the-shelf GIS remains difficult to use for occasional GIS experts. The major problem lies in that its interface organizes spatial analysis tools and functions according to spatial data structures and corresponding algorithms, which is conceptually confusing and cognitively complex. Prior work identified the usability problem of conventional GIS interface and developed alternatives based on speech or gesture to narrow the gap between the high-functionality provided by GIS and its usability. This paper outlined my doctoral research goal in understanding human-GIS interaction activity, especially how interaction modalities assist to capture spatial analysis intention and influence collaborative spatial problem solving. We proposed a framework for enabling multimodal human-GIS interaction driven by intention. We also implemented a prototype GeoEASI (Geo-dialogue Environment for Assisted Spatial Inquiry) to demonstrate the effectiveness of our framework. GeoEASI understands commonly known spatial analysis intentions through multimodal techniques and is able to assist users to perform spatial analysis with proper strategies. Further work will evaluate the effectiveness of our framework, improve the reliability and flexibility of the system, extend the GIS interface for supporting multiple users, and integrate the system into GeoDeliberation. We will concentrate on how multimodality technology can be adopted in these circumstances and explore the potentials of it. The study aims to demonstrate the feasibility of building a GIS to be both useful and usable by introducing an intent-driven multimodal interface, forming the key to building a better theory of spatial thinking for GIS.

地理信息系统(GIS)为空间分析和地理空间信息检索提供了大量功能。然而，现成的GIS对于偶尔的GIS专家来说仍然很难使用。主要问题在于其界面根据空间数据结构和相应算法组织空间分析工具和功能，在概念上比较混乱，认知上比较复杂。先前的工作确定了传统GIS界面的可用性问题，并开发了基于语音或手势的替代方案，以缩小GIS提供的高功能性与其可用性之间的差距。本文概述了我的博士研究目标，即理解人- gis交互活动，特别是交互模式如何帮助捕获空间分析意图并影响协作空间问题解决。我们提出了一个由意图驱动的多模态人机交互框架。我们还实现了一个原型GeoEASI(辅助空间查询的地理对话环境)来展示我们框架的有效性。GeoEASI通过多模态技术理解常见的空间分析意图，并能够帮助用户使用适当的策略进行空间分析。进一步的工作将评估我们的框架的有效性，提高系统的可靠性和灵活性，扩展GIS接口以支持多用户，并将系统集成到geoconsideration中。我们将集中讨论如何在这些情况下采用多模态技术并探索其潜力。本研究旨在通过引入意图驱动的多模态界面，证明构建一个有用且可用的GIS的可行性，形成构建一个更好的GIS空间思维理论的关键。

{"title":"Exploring Intent-driven Multimodal Interface for Geographical Information System","authors":"Feng Sun","doi":"10.1145/2818346.2823304","DOIUrl":"https://doi.org/10.1145/2818346.2823304","url":null,"abstract":"Geographic Information Systems (GIS) offers a large amount of functions for performing spatial analysis and geospatial information retrieval. However, off-the-shelf GIS remains difficult to use for occasional GIS experts. The major problem lies in that its interface organizes spatial analysis tools and functions according to spatial data structures and corresponding algorithms, which is conceptually confusing and cognitively complex. Prior work identified the usability problem of conventional GIS interface and developed alternatives based on speech or gesture to narrow the gap between the high-functionality provided by GIS and its usability. This paper outlined my doctoral research goal in understanding human-GIS interaction activity, especially how interaction modalities assist to capture spatial analysis intention and influence collaborative spatial problem solving. We proposed a framework for enabling multimodal human-GIS interaction driven by intention. We also implemented a prototype GeoEASI (Geo-dialogue Environment for Assisted Spatial Inquiry) to demonstrate the effectiveness of our framework. GeoEASI understands commonly known spatial analysis intentions through multimodal techniques and is able to assist users to perform spatial analysis with proper strategies. Further work will evaluate the effectiveness of our framework, improve the reliability and flexibility of the system, extend the GIS interface for supporting multiple users, and integrate the system into GeoDeliberation. We will concentrate on how multimodality technology can be adopted in these circumstances and explore the potentials of it. The study aims to demonstrate the feasibility of building a GIS to be both useful and usable by introducing an intent-driven multimodal interface, forming the key to building a better theory of spatial thinking for GIS.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86350987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Detecting and Synthesizing Synchronous Joint Action in Human-Robot Teams 人-机器人团队同步关节动作的检测与综合

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823315

T. Iqbal, L. Riek

To become capable teammates to people, robots need the ability to interpret human activities and appropriately adjust their actions in real time. The goal of our research is to build robots that can work fluently and contingently with human teams. To this end, we have designed novel nonlinear dynamical methods to automatically model and detect synchronous joint action (SJA) in human teams. We also have extended this work to enable robots to move jointly with human teammates in real time. In this paper, we describe our work to date, and discuss our future research plans to further explore this research space. The results of this work are expected to benefit researchers in social signal processing, human-machine interaction, and robotics.

为了成为有能力的人的队友，机器人需要能够解释人类的活动，并及时适当地调整他们的行动。我们研究的目标是制造能够流畅地与人类团队合作的机器人。为此，我们设计了一种新的非线性动力学方法来自动建模和检测人类团队中的同步关节动作。我们还扩展了这项工作，使机器人能够与人类队友实时联合移动。在本文中，我们描述了我们迄今为止的工作，并讨论了我们未来的研究计划，以进一步探索这一研究空间。这项工作的结果有望使社会信号处理、人机交互和机器人技术的研究人员受益。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀