首页 > 最新文献

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction最新文献

英文 中文
Assessment of users' interests in multimodal dialog based on exchange unit 基于交换单元的多模态对话用户兴趣评估
Sayaka Tomimasu, Masahiro Araki
A person is more likely to enjoy long-term conversations with a robot if it has the capability to infer the topics that interest the person. In this paper, we propose a method of deducing the specific topics that interest a user by sequentially assessing each exchange in a chat-oriented dialog session. We use multimodal information such as facial expressions and prosodic information obtained from the user's utterances for assessing interest as these parameters are independent of linguistic information that varies widely in chat-oriented dialogs. The results show that the accuracy of the assessment of the user's interest is better when we use both features.
如果机器人能够推断出人们感兴趣的话题,人们更有可能喜欢与机器人进行长时间的对话。在本文中,我们提出了一种通过顺序评估面向聊天的对话会话中的每个交换来推断用户感兴趣的特定主题的方法。我们使用从用户话语中获得的面部表情和韵律信息等多模态信息来评估兴趣,因为这些参数独立于语言信息,而语言信息在以聊天为导向的对话中变化很大。结果表明,当我们同时使用这两种特征时,对用户兴趣的评估准确率更高。
{"title":"Assessment of users' interests in multimodal dialog based on exchange unit","authors":"Sayaka Tomimasu, Masahiro Araki","doi":"10.1145/3011263.3011269","DOIUrl":"https://doi.org/10.1145/3011263.3011269","url":null,"abstract":"A person is more likely to enjoy long-term conversations with a robot if it has the capability to infer the topics that interest the person. In this paper, we propose a method of deducing the specific topics that interest a user by sequentially assessing each exchange in a chat-oriented dialog session. We use multimodal information such as facial expressions and prosodic information obtained from the user's utterances for assessing interest as these parameters are independent of linguistic information that varies widely in chat-oriented dialogs. The results show that the accuracy of the assessment of the user's interest is better when we use both features.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"3 3-4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128275180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Deictic gestures in coaching interactions 指导互动中的指示手势
I. D. Kok, J. Hough, David Schlangen, S. Kopp
In motor skill coaching interaction coaches use several techniques to improve the motor skill of the coachee. Through goal setting, explanations, instructions and feedback the coachee is motivated and guided to improve the motor skill. These verbal speech actions are often accompanied by iconic or deictic gestures and other nonverbal acts, such as demonstrations. We are building a virtual coach that is capable of the same behaviour. In this paper we have taken a closer look at the form, type and timing of deictic gestures in our corpus of human-human coaching interactions. We show that a significant amount of the deictic gestures actually touch the referred object, that most of the gestures are complimentary (contrary to previous research) and often occur before the lexical affiliate.
在运动技能训练中,互动教练使用几种技术来提高教练员的运动技能。通过目标设定、解释、指导和反馈,激励和引导教练员提高运动技能。这些言语行为通常伴随着标志性或指示性手势和其他非言语行为,如示范。我们正在打造一个能够做出同样行为的虚拟教练。在本文中,我们仔细研究了人类教练互动语料库中指示手势的形式、类型和时间。我们发现,大量指示手势实际上是与所指对象接触的,大多数手势是赞美的(与之前的研究相反),并且经常发生在词汇关联之前。
{"title":"Deictic gestures in coaching interactions","authors":"I. D. Kok, J. Hough, David Schlangen, S. Kopp","doi":"10.1145/3011263.3011267","DOIUrl":"https://doi.org/10.1145/3011263.3011267","url":null,"abstract":"In motor skill coaching interaction coaches use several techniques to improve the motor skill of the coachee. Through goal setting, explanations, instructions and feedback the coachee is motivated and guided to improve the motor skill. These verbal speech actions are often accompanied by iconic or deictic gestures and other nonverbal acts, such as demonstrations. We are building a virtual coach that is capable of the same behaviour. In this paper we have taken a closer look at the form, type and timing of deictic gestures in our corpus of human-human coaching interactions. We show that a significant amount of the deictic gestures actually touch the referred object, that most of the gestures are complimentary (contrary to previous research) and often occur before the lexical affiliate.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Attitude recognition of video bloggers using audio-visual descriptors 使用视听描述符的视频博主态度识别
F. Haider, L. Cerrato, S. Luz, N. Campbell
In social media, vlogs (video blogs) are a form of unidirectional communication, where the vloggers (video bloggers) convey their messages (opinions, thoughts, etc.) to a potential audience which cannot give them feedback in real time. In this kind of communication, the non-verbal behaviour and personality impression of a video blogger tends to influence viewers' attention because non-verbal cues are correlated with the messages conveyed by a vlogger. In this study, we use the acoustic and visual features (body movements that are captured by low-level visual descriptors) to predict the six different attitudes (amusement, enthusiasm, friendliness, frustration, impatience and neutral) annotated in the speech of 10 video bloggers. The automatic detection of attitude can be helpful in a scenario where a machine has to automatically provide feedback to bloggers about their performance in terms of the extent to which they manage to engage the audience by displaying certain attitudes. Attitude recognition models are trained using the random forest classifier. Results show that: 1) acoustic features provide better accuracy than the visual features, 2) while fusion of audio and visual features does not increase overall accuracy, it improves the results for some attitudes and subjects, and 3) densely extracted histograms of flow provide better results than other visual descriptors. A three-class (positive, negative and neutral attitudes) problem has also been defined. Results for this setting show that feature fusion degrades overall classifier accuracy, and the classifiers perform better on the original six-class problem than on the three-class setting.
在社交媒体中,vlog(视频博客)是一种单向的交流形式,视频博主将他们的信息(观点、想法等)传达给潜在的受众,而这些受众无法实时地给他们反馈。在这种交流中,视频博主的非语言行为和个性印象往往会影响观众的注意力,因为非语言线索与视频博主传达的信息是相关的。在这项研究中,我们使用声音和视觉特征(由低级视觉描述符捕获的身体动作)来预测10个视频博主演讲中注释的六种不同态度(娱乐,热情,友好,沮丧,不耐烦和中立)。这种态度的自动检测在一个场景中是很有帮助的,在这个场景中,机器必须自动向博主提供他们的表现反馈,根据他们通过展示某种态度来吸引观众的程度。使用随机森林分类器训练姿态识别模型。结果表明:1)声学特征比视觉特征提供了更好的准确率;2)视听特征融合不提高整体准确率,但对某些姿态和被试的结果有所改善;3)密集提取的流量直方图比其他视觉描述符提供了更好的结果。还定义了一个三类(积极、消极和中性态度)问题。该设置的结果表明,特征融合降低了分类器的整体精度,并且分类器在原始六类问题上的表现优于三类设置。
{"title":"Attitude recognition of video bloggers using audio-visual descriptors","authors":"F. Haider, L. Cerrato, S. Luz, N. Campbell","doi":"10.1145/3011263.3011270","DOIUrl":"https://doi.org/10.1145/3011263.3011270","url":null,"abstract":"In social media, vlogs (video blogs) are a form of unidirectional communication, where the vloggers (video bloggers) convey their messages (opinions, thoughts, etc.) to a potential audience which cannot give them feedback in real time. In this kind of communication, the non-verbal behaviour and personality impression of a video blogger tends to influence viewers' attention because non-verbal cues are correlated with the messages conveyed by a vlogger. In this study, we use the acoustic and visual features (body movements that are captured by low-level visual descriptors) to predict the six different attitudes (amusement, enthusiasm, friendliness, frustration, impatience and neutral) annotated in the speech of 10 video bloggers. The automatic detection of attitude can be helpful in a scenario where a machine has to automatically provide feedback to bloggers about their performance in terms of the extent to which they manage to engage the audience by displaying certain attitudes. Attitude recognition models are trained using the random forest classifier. Results show that: 1) acoustic features provide better accuracy than the visual features, 2) while fusion of audio and visual features does not increase overall accuracy, it improves the results for some attitudes and subjects, and 3) densely extracted histograms of flow provide better results than other visual descriptors. A three-class (positive, negative and neutral attitudes) problem has also been defined. Results for this setting show that feature fusion degrades overall classifier accuracy, and the classifiers perform better on the original six-class problem than on the three-class setting.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122902222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Increasing robustness of multimodal interaction via individual interaction histories 通过个体交互历史增加多模态交互的鲁棒性
Felix Schüssel, F. Honold, N. Bubalo, M. Weber
Multimodal input fusion can be considered a well researched topic and yet it is rarely found in real world applications. One reason for this could be the lack of robustness in real world situations, especially regarding unimodal recognition technologies like speech and gesture, that tend to produce erroneous inputs that can not be detected by the subsequent multimodal input fusion mechanism. Previous work implying the possibility to detect and overcome such errors through knowledge of individual temporal behaviors has neither provided a real-time implementation nor evaluated the real benefit of such an approach. We present such an implementation of applying individual interaction histories in order to increase the robustness of multimodal inputs within a smartwatch scenario. We show how such knowledge can be created and maintained at runtime, present evaluation data from an experiment conducted in a realistic scenario, and compare the approach to the state of the art known from literature. Our approach is ready to use in other applications and existing systems, with the prospect to increase the overall robustness of future multimodal systems.
多模态输入融合可以被认为是一个很好的研究课题,但在实际应用中却很少发现。其中一个原因可能是在现实世界中缺乏鲁棒性,特别是在语音和手势等单模态识别技术方面,这往往会产生错误的输入,而这些输入无法被随后的多模态输入融合机制检测到。以前的工作暗示了通过个体时间行为的知识来检测和克服这种错误的可能性,但既没有提供实时实现,也没有评估这种方法的真正好处。我们提出了这样一个应用个人交互历史的实现,以增加智能手表场景中多模态输入的鲁棒性。我们展示了如何在运行时创建和维护这些知识,给出了在现实场景中进行的实验的评估数据,并将该方法与文献中已知的技术状态进行了比较。我们的方法已准备好用于其他应用和现有系统,并有望提高未来多模态系统的整体鲁棒性。
{"title":"Increasing robustness of multimodal interaction via individual interaction histories","authors":"Felix Schüssel, F. Honold, N. Bubalo, M. Weber","doi":"10.1145/3011263.3011273","DOIUrl":"https://doi.org/10.1145/3011263.3011273","url":null,"abstract":"Multimodal input fusion can be considered a well researched topic and yet it is rarely found in real world applications. One reason for this could be the lack of robustness in real world situations, especially regarding unimodal recognition technologies like speech and gesture, that tend to produce erroneous inputs that can not be detected by the subsequent multimodal input fusion mechanism. Previous work implying the possibility to detect and overcome such errors through knowledge of individual temporal behaviors has neither provided a real-time implementation nor evaluated the real benefit of such an approach. We present such an implementation of applying individual interaction histories in order to increase the robustness of multimodal inputs within a smartwatch scenario. We show how such knowledge can be created and maintained at runtime, present evaluation data from an experiment conducted in a realistic scenario, and compare the approach to the state of the art known from literature. Our approach is ready to use in other applications and existing systems, with the prospect to increase the overall robustness of future multimodal systems.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128419500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysis of gesture frequency and amplitude as a function of personality in virtual agents 虚拟代理中手势频率和幅度作为个性函数的分析
Alex Rayón, Timothy Gonzalez, D. Novick
Embodied conversational agents are changing the way humans interact with technology. In order to develop humanlike ECAs they need to be able to perform natural gestures that are used in day-to-day conversation. Gestures can give insight into an ECAs personality trait of extraversion, but what factors into it is still being explored. Our study focuses on two aspects of gesture: amplitude and frequency. Our goal is to find out whether agents should use specific gestures more frequently than others depending on the personality type they have been designed with. We also look to quantify gesture amplitude and compare it to a previous study on the perception of an agent's naturalness of its gestures. Our results showed some indication that introverts and extraverts judge the agent's naturalness similarly. The larger the amplitude our agent used, the more natural its gestures were perceived. The frequency of gestures between extraverts and introverts seem to contain hardly any difference, even in terms of types of gesture used.
具身会话代理正在改变人类与技术互动的方式。为了开发类似人类的eca,它们需要能够在日常对话中使用自然手势。手势可以让我们深入了解eca的外向性格特征,但究竟是什么因素造成的,我们还在探索中。我们的研究主要集中在手势的两个方面:幅度和频率。我们的目标是找出智能体是否应该比其他智能体更频繁地使用特定的手势,这取决于它们被设计成的个性类型。我们还希望量化手势幅度,并将其与之前关于智能体手势自然度感知的研究进行比较。我们的研究结果表明,内向者和外向者对代理人的自然程度的判断是相似的。我们的智能体使用的幅度越大,它的手势就越自然。外向者和内向者使用手势的频率似乎几乎没有什么不同,甚至在使用手势的类型方面也是如此。
{"title":"Analysis of gesture frequency and amplitude as a function of personality in virtual agents","authors":"Alex Rayón, Timothy Gonzalez, D. Novick","doi":"10.1145/3011263.3011266","DOIUrl":"https://doi.org/10.1145/3011263.3011266","url":null,"abstract":"Embodied conversational agents are changing the way humans interact with technology. In order to develop humanlike ECAs they need to be able to perform natural gestures that are used in day-to-day conversation. Gestures can give insight into an ECAs personality trait of extraversion, but what factors into it is still being explored. Our study focuses on two aspects of gesture: amplitude and frequency. Our goal is to find out whether agents should use specific gestures more frequently than others depending on the personality type they have been designed with. We also look to quantify gesture amplitude and compare it to a previous study on the perception of an agent's naturalness of its gestures. Our results showed some indication that introverts and extraverts judge the agent's naturalness similarly. The larger the amplitude our agent used, the more natural its gestures were perceived. The frequency of gestures between extraverts and introverts seem to contain hardly any difference, even in terms of types of gesture used.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction 人机交互中启用人工智能的多模态分析研讨会论文集
Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe
{"title":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","authors":"Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe","doi":"10.1145/3011263","DOIUrl":"https://doi.org/10.1145/3011263","url":null,"abstract":"","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotation and analysis of listener's engagement based on multi-modal behaviors 基于多模态行为的听者参与注释与分析
K. Inoue, Divesh Lala, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara
We address the annotation of engagement in the context of human-machine interaction. Engagement represents the level of how much a user is being interested in and willing to continue the current interaction. The conversational data used in the annotation work is a human-robot interaction corpus where a human subject talks with the android ERICA, which is remotely operated by another human subject. The annotation work was done by multiple third-party annotators, and the task was to detect the time point when the level of engagement becomes high. The annotation results indicate that there are agreements among the annotators although the numbers of annotated points are different among them. It is also found that the level of engagement is related to turn-taking behaviors. Furthermore, we conducted interviews with the annotators to reveal behaviors used to show a high level of engagement. The results suggest that laughing, backchannels and nodding are related to the level of engagement.
我们在人机交互的背景下处理参与的注释。用户粘性代表了用户对当前互动感兴趣并愿意继续的程度。注释工作中使用的会话数据是一个人机交互语料库,其中人类主体与由另一个人类主体远程操作的机器人ERICA进行对话。注释工作由多个第三方注释人员完成,任务是检测参与度高的时间点。标注结果表明,虽然标注点的数量不同,但标注者之间的标注是一致的。研究还发现,参与程度与轮流行为有关。此外,我们对注释者进行了访谈,以揭示用于显示高参与度的行为。研究结果表明,笑、回话和点头与参与程度有关。
{"title":"Annotation and analysis of listener's engagement based on multi-modal behaviors","authors":"K. Inoue, Divesh Lala, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara","doi":"10.1145/3011263.3011271","DOIUrl":"https://doi.org/10.1145/3011263.3011271","url":null,"abstract":"We address the annotation of engagement in the context of human-machine interaction. Engagement represents the level of how much a user is being interested in and willing to continue the current interaction. The conversational data used in the annotation work is a human-robot interaction corpus where a human subject talks with the android ERICA, which is remotely operated by another human subject. The annotation work was done by multiple third-party annotators, and the task was to detect the time point when the level of engagement becomes high. The annotation results indicate that there are agreements among the annotators although the numbers of annotated points are different among them. It is also found that the level of engagement is related to turn-taking behaviors. Furthermore, we conducted interviews with the annotators to reveal behaviors used to show a high level of engagement. The results suggest that laughing, backchannels and nodding are related to the level of engagement.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127549476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Automatic annotation of gestural units in spontaneous face-to-face interaction 自发面对面互动中手势单位的自动标注
Simon Alexanderson, D. House, J. Beskow
Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.
语言和手势以高度复杂的方式共同出现在自发对话中。人们在对话中表现出的动作有很大的可变性,在不同的互动状态下会出现不同的动作。广泛的多模态界面应用,例如在虚拟代理或社交机器人领域,可以设想能够自动识别携带信息的手势并将其与其他类型的运动区分开来是很重要的。虽然人类很容易从多模态信息流中区分和分割手动手势,但同样的任务对于机器来说并不容易。本文提出了一种从三维动作捕捉数据流中自动分割和标记手势单元的方法。手势流模型采用2级隐马尔可夫模型(HHMM),其中子状态对应于手势的各个阶段。该模型是基于完整手势单元和自适应机械手的标签进行训练的。该模型在两个不同类型和捕获运动方法的数据集上进行了测试和验证,并且在公开可用的数据集上优于最先进的SVM分类器。
{"title":"Automatic annotation of gestural units in spontaneous face-to-face interaction","authors":"Simon Alexanderson, D. House, J. Beskow","doi":"10.1145/3011263.3011268","DOIUrl":"https://doi.org/10.1145/3011263.3011268","url":null,"abstract":"Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fitmirror: a smart mirror for positive affect in everyday user morning routines Fitmirror:一款智能镜子,为用户的日常生活带来积极影响
Daniel Besserer, Johannes Bäurle, Alexander Nikic, F. Honold, Felix Schüssel, M. Weber
This paper will discuss the concept of a smart mirror for healthier living, the FitMirror. Many people have serious problems to get up after sleeping, to get motivated for the day, or are tired and in a bad mood in the morning. The goal of FitMirror is to positively affect the user's feelings by increasing his/her motivation, mood and feeling of fitness. While concepts for these isolated problems exist, none of these combine them into one system. FitMirror is implemented to combine them and evaluate them in a study. It consists of a monitor with spy-foil, a Microsoft Kinect v2 and a Wii Balance Board and can recognize users and their gestures with these elements. Several hypotheses about the system regarding motivation, fun, difficulty and getting awake were investigated. Participants were grouped by the factors sportspersons and morning persons to investigate the effect based on these aspects. Results show that FitMirror can help users get awake in the morning, raise their motivation to do sports and motivate them for the day.
本文将讨论健康生活的智能镜子FitMirror的概念。很多人在睡觉后都有起床的严重问题,无法为一天的工作找到动力,或者早上很累,心情不好。FitMirror的目标是通过增加用户的健身动机、情绪和感觉来积极影响用户的感受。虽然存在这些孤立问题的概念,但没有一个将它们组合成一个系统。FitMirror的实现是将它们结合起来,并在研究中对它们进行评估。它由一个带有间谍箔的显示器、一个微软Kinect v2和一个Wii平衡板组成,可以通过这些元素识别用户和他们的手势。研究人员对该系统的动机、乐趣、难度和清醒进行了研究。将被试按运动员和晨起者进行分组,从这些方面考察其影响。结果表明,FitMirror可以帮助用户在早上醒来,提高他们做运动的动力,激励他们一天。
{"title":"Fitmirror: a smart mirror for positive affect in everyday user morning routines","authors":"Daniel Besserer, Johannes Bäurle, Alexander Nikic, F. Honold, Felix Schüssel, M. Weber","doi":"10.1145/3011263.3011265","DOIUrl":"https://doi.org/10.1145/3011263.3011265","url":null,"abstract":"This paper will discuss the concept of a smart mirror for healthier living, the FitMirror. Many people have serious problems to get up after sleeping, to get motivated for the day, or are tired and in a bad mood in the morning. The goal of FitMirror is to positively affect the user's feelings by increasing his/her motivation, mood and feeling of fitness. While concepts for these isolated problems exist, none of these combine them into one system. FitMirror is implemented to combine them and evaluate them in a study. It consists of a monitor with spy-foil, a Microsoft Kinect v2 and a Wii Balance Board and can recognize users and their gestures with these elements. Several hypotheses about the system regarding motivation, fun, difficulty and getting awake were investigated. Participants were grouped by the factors sportspersons and morning persons to investigate the effect based on these aspects. Results show that FitMirror can help users get awake in the morning, raise their motivation to do sports and motivate them for the day.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128431675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Body movements and laughter recognition: experiments in first encounter dialogues 肢体动作与笑声识别:初次相遇对话的实验
Kristiina Jokinen, Trung Ngo Trong, G. Wilcock
This paper reports work on automatic analysis of laughter and human body movements in a video corpus of human-human dialogues. We use the Nordic First Encounters video corpus where participants meet each other for the first time. This corpus has manual annotations of participants' head, hand and body movements as well as laughter occurrences. We employ machine learning methods to analyse the corpus using two types of features: visual features that describe bounding boxes around participants' heads and bodies, automatically detecting body movements in the video, and audio speech features based on the participants' spoken contributions. We then correlate the speech and video features and apply neural network techniques to predict if a person is laughing or not given a sequence of video features. The hypothesis is that laughter occurrences and body movement are synchronized, or at least there is a significant relation between laughter activities and occurrences of body movements. Our results confirm the hypothesis of the synchrony of body movements with laughter, but we also emphasise the complexity of the problem and the need for further investigations on the feature sets and the algorithm used.
本文报道了在一个人类对话视频语料库中对笑声和人体动作进行自动分析的工作。我们使用北欧第一次相遇视频语料库,参与者第一次见面。这个语料库对参与者的头部、手部和身体动作以及笑声的发生进行了手动注释。我们使用机器学习方法使用两种类型的特征来分析语料库:描述参与者头部和身体周围边界框的视觉特征,自动检测视频中的身体运动,以及基于参与者口头贡献的音频语音特征。然后,我们将语音和视频特征联系起来,并应用神经网络技术来预测一个人是否在笑,并给出一系列视频特征。假设是笑的发生和身体运动是同步的,或者至少在笑的活动和身体运动的发生之间有显著的关系。我们的研究结果证实了身体运动与笑声同步的假设,但我们也强调了问题的复杂性,以及对所使用的特征集和算法进行进一步研究的必要性。
{"title":"Body movements and laughter recognition: experiments in first encounter dialogues","authors":"Kristiina Jokinen, Trung Ngo Trong, G. Wilcock","doi":"10.1145/3011263.3011264","DOIUrl":"https://doi.org/10.1145/3011263.3011264","url":null,"abstract":"This paper reports work on automatic analysis of laughter and human body movements in a video corpus of human-human dialogues. We use the Nordic First Encounters video corpus where participants meet each other for the first time. This corpus has manual annotations of participants' head, hand and body movements as well as laughter occurrences. We employ machine learning methods to analyse the corpus using two types of features: visual features that describe bounding boxes around participants' heads and bodies, automatically detecting body movements in the video, and audio speech features based on the participants' spoken contributions. We then correlate the speech and video features and apply neural network techniques to predict if a person is laughing or not given a sequence of video features. The hypothesis is that laughter occurrences and body movement are synchronized, or at least there is a significant relation between laughter activities and occurrences of body movements. Our results confirm the hypothesis of the synchrony of body movements with laughter, but we also emphasise the complexity of the problem and the need for further investigations on the feature sets and the algorithm used.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127932950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1