首页 > 最新文献

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

英文 中文
A Multimodal System for Public Speaking with Real Time Feedback 具有实时反馈的公共演讲多模式系统
F. Dermody, Alistair Sutherland
We have developed a multimodal prototype for public speaking with real time feedback using the Microsoft Kinect. Effective speaking involves use of gesture, facial expression, posture, voice as well as the spoken word. These modalities combine to give the appearance of self-confidence in the speaker. This initial prototype detects body pose, facial expressions and voice. Visual and text feedback is displayed in real time to the user using a video panel, icon panel and text feedback panel. The user can also set and view elapsed time during their speaking performance. Real time feedback is displayed on gaze direction, body pose and gesture, vocal tonality, vocal dysfluencies and speaking rate.
我们已经开发了一个多模式的原型,用于使用微软Kinect进行实时反馈的公共演讲。有效的说话包括使用手势、面部表情、姿势、声音以及说的话。这些方式结合在一起,使说话者显得自信。这个最初的原型可以检测身体姿势、面部表情和声音。通过视频面板、图标面板和文本反馈面板,实时向用户显示视觉和文本反馈。用户还可以设置和查看他们说话过程中经过的时间。实时反馈显示在凝视方向,身体姿势和手势,语音音调,语音不流畅和说话速度。
{"title":"A Multimodal System for Public Speaking with Real Time Feedback","authors":"F. Dermody, Alistair Sutherland","doi":"10.1145/2818346.2823295","DOIUrl":"https://doi.org/10.1145/2818346.2823295","url":null,"abstract":"We have developed a multimodal prototype for public speaking with real time feedback using the Microsoft Kinect. Effective speaking involves use of gesture, facial expression, posture, voice as well as the spoken word. These modalities combine to give the appearance of self-confidence in the speaker. This initial prototype detects body pose, facial expressions and voice. Visual and text feedback is displayed in real time to the user using a video panel, icon panel and text feedback panel. The user can also set and view elapsed time during their speaking performance. Real time feedback is displayed on gaze direction, body pose and gesture, vocal tonality, vocal dysfluencies and speaking rate.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85853786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
I Would Hire You in a Minute: Thin Slices of Nonverbal Behavior in Job Interviews 我会在一分钟内雇用你:工作面试中的非语言行为切片
L. Nguyen, D. Gática-Pérez
In everyday life, judgments people make about others are based on brief excerpts of interactions, known as thin slices. Inferences stemming from such minimal information can be quite accurate, and nonverbal behavior plays an important role in the impression formation. Because protagonists are strangers, employment interviews are a case where both nonverbal behavior and thin slices can be predictive of outcomes. In this work, we analyze the predictive validity of thin slices of real job interviews, where slices are defined by the sequence of questions in a structured interview format. We approach this problem from an audio-visual, dyadic, and nonverbal perspective, where sensing, cue extraction, and inference are automated. Our study shows that although nonverbal behavioral cues extracted from thin slices were not as predictive as when extracted from the full interaction, they were still predictive of hirability impressions with $R^2$ values up to $0.34$, which was comparable to the predictive validity of human observers on thin slices. Applicant audio cues were found to yield the most accurate results.
在日常生活中,人们对他人的判断是基于互动的简短摘录,即所谓的“薄片”。从这些最少的信息中得出的推论可以相当准确,非语言行为在印象形成中起着重要作用。因为主角都是陌生人,所以在求职面试中,非语言行为和薄片都可以预测结果。在这项工作中,我们分析了真实工作面试的薄片的预测有效性,其中薄片是由结构化面试格式中的问题序列定义的。我们从视听、二元和非语言的角度来解决这个问题,其中感知、线索提取和推理是自动化的。我们的研究表明,尽管从薄片中提取的非语言行为线索不像从完整的互动中提取的那样具有预测性,但它们仍然可以预测可预测性印象,其R^2$值高达0.34$,这与人类观察者在薄片上的预测有效性相当。申请人的音频提示被发现产生最准确的结果。
{"title":"I Would Hire You in a Minute: Thin Slices of Nonverbal Behavior in Job Interviews","authors":"L. Nguyen, D. Gática-Pérez","doi":"10.1145/2818346.2820760","DOIUrl":"https://doi.org/10.1145/2818346.2820760","url":null,"abstract":"In everyday life, judgments people make about others are based on brief excerpts of interactions, known as thin slices. Inferences stemming from such minimal information can be quite accurate, and nonverbal behavior plays an important role in the impression formation. Because protagonists are strangers, employment interviews are a case where both nonverbal behavior and thin slices can be predictive of outcomes. In this work, we analyze the predictive validity of thin slices of real job interviews, where slices are defined by the sequence of questions in a structured interview format. We approach this problem from an audio-visual, dyadic, and nonverbal perspective, where sensing, cue extraction, and inference are automated. Our study shows that although nonverbal behavioral cues extracted from thin slices were not as predictive as when extracted from the full interaction, they were still predictive of hirability impressions with $R^2$ values up to $0.34$, which was comparable to the predictive validity of human observers on thin slices. Applicant audio cues were found to yield the most accurate results.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90852149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Session details: Doctoral Consortium 会议详情:博士联盟
C. Busso
{"title":"Session details: Doctoral Consortium","authors":"C. Busso","doi":"10.1145/3252454","DOIUrl":"https://doi.org/10.1145/3252454","url":null,"abstract":"","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89199725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accuracy vs. Availability Heuristic in Multimodal Affect Detection in the Wild 野外多模态情感检测的准确性与可用性启发式
Nigel Bosch, Huili Chen, S. D’Mello, R. Baker, V. Shute
This paper discusses multimodal affect detection from a fusion of facial expressions and interaction features derived from students' interactions with an educational game in the noisy real-world context of a computer-enabled classroom. Log data of students' interactions with the game and face videos from 133 students were recorded in a computer-enabled classroom over a two day period. Human observers live annotated learning-centered affective states such as engagement, confusion, and frustration. The face-only detectors were more accurate than interaction-only detectors. Multimodal affect detectors did not show any substantial improvement in accuracy over the face-only detectors. However, the face-only detectors were only applicable to 65% of the cases due to face registration errors caused by excessive movement, occlusion, poor lighting, and other factors. Multimodal fusion techniques were able to improve the applicability of detectors to 98% of cases without sacrificing classification accuracy. Balancing the accuracy vs. applicability tradeoff appears to be an important feature of multimodal affect detection.
本文讨论了基于面部表情融合的多模态情感检测和交互特征,这些特征来源于学生在嘈杂的现实世界背景下的计算机教室中与教育游戏的互动。在为期两天的电脑教室里,研究人员记录了133名学生与游戏互动的日志数据和面部视频。人类观察者生活在以学习为中心的情感状态中,比如投入、困惑和沮丧。只看脸的探测器比只看互动的探测器更准确。多模态情感检测器在准确性上没有显示出任何实质性的提高。然而,由于过度运动、遮挡、光线不足等因素导致的人脸配准错误,纯人脸检测器仅适用于65%的病例。多模态融合技术能够在不牺牲分类精度的情况下将检测器的适用性提高到98%。平衡准确性与适用性的权衡似乎是多模态影响检测的一个重要特征。
{"title":"Accuracy vs. Availability Heuristic in Multimodal Affect Detection in the Wild","authors":"Nigel Bosch, Huili Chen, S. D’Mello, R. Baker, V. Shute","doi":"10.1145/2818346.2820739","DOIUrl":"https://doi.org/10.1145/2818346.2820739","url":null,"abstract":"This paper discusses multimodal affect detection from a fusion of facial expressions and interaction features derived from students' interactions with an educational game in the noisy real-world context of a computer-enabled classroom. Log data of students' interactions with the game and face videos from 133 students were recorded in a computer-enabled classroom over a two day period. Human observers live annotated learning-centered affective states such as engagement, confusion, and frustration. The face-only detectors were more accurate than interaction-only detectors. Multimodal affect detectors did not show any substantial improvement in accuracy over the face-only detectors. However, the face-only detectors were only applicable to 65% of the cases due to face registration errors caused by excessive movement, occlusion, poor lighting, and other factors. Multimodal fusion techniques were able to improve the applicability of detectors to 98% of cases without sacrificing classification accuracy. Balancing the accuracy vs. applicability tradeoff appears to be an important feature of multimodal affect detection.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88503240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Deciphering the Silent Participant: On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions 解读沉默的参与者:利用视听线索对小组讨论中的听者类别进行分类
Catharine Oertel, Kenneth Alberto Funes Mora, Joakim Gustafson, J. Odobez
Estimating a silent participant's degree of engagement and his role within a group discussion can be challenging, as there are no speech related cues available at the given time. Having this information available, however, can provide important insights into the dynamics of the group as a whole. In this paper, we study the classification of listeners into several categories (attentive listener, side participant and bystander). We devised a thin-sliced perception test where subjects were asked to assess listener roles and engagement levels in 15-second video-clips taken from a corpus of group interviews. Results show that humans are usually able to assess silent participant roles. Using the annotation to identify from a set of multimodal low-level features, such as past speaking activity, backchannels (both visual and verbal), as well as gaze patterns, we could identify the features which are able to distinguish between different listener categories. Moreover, the results show that many of the audio-visual effects observed on listeners in dyadic interactions, also hold for multi-party interactions. A preliminary classifier achieves an accuracy of 64 %.
估计一个沉默的参与者的参与程度和他在小组讨论中的角色是具有挑战性的,因为在给定的时间内没有与言语相关的线索。然而,拥有这些可用的信息可以提供对整个团队动态的重要见解。在本文中,我们研究了倾听者的分类:注意倾听者、侧面参与者和旁观者。我们设计了一个薄片感知测试,要求受试者在15秒的视频片段中评估听众的角色和参与程度,这些视频片段取自小组访谈的语料库。结果表明,人类通常能够评估沉默参与者的角色。使用注释从一组多模态低级特征中进行识别,例如过去的说话活动,反向通道(视觉和口头)以及凝视模式,我们可以识别能够区分不同听众类别的特征。此外,结果表明,在二元互动中观察到的听者的许多视听效果也适用于多方互动。初步分类器的准确率达到64%。
{"title":"Deciphering the Silent Participant: On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions","authors":"Catharine Oertel, Kenneth Alberto Funes Mora, Joakim Gustafson, J. Odobez","doi":"10.1145/2818346.2820759","DOIUrl":"https://doi.org/10.1145/2818346.2820759","url":null,"abstract":"Estimating a silent participant's degree of engagement and his role within a group discussion can be challenging, as there are no speech related cues available at the given time. Having this information available, however, can provide important insights into the dynamics of the group as a whole. In this paper, we study the classification of listeners into several categories (attentive listener, side participant and bystander). We devised a thin-sliced perception test where subjects were asked to assess listener roles and engagement levels in 15-second video-clips taken from a corpus of group interviews. Results show that humans are usually able to assess silent participant roles. Using the annotation to identify from a set of multimodal low-level features, such as past speaking activity, backchannels (both visual and verbal), as well as gaze patterns, we could identify the features which are able to distinguish between different listener categories. Moreover, the results show that many of the audio-visual effects observed on listeners in dyadic interactions, also hold for multi-party interactions. A preliminary classifier achieves an accuracy of 64 %.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86895940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings 基于呼吸和凝视的多模态融合预测多方会议下一位发言者
Ryo Ishii, Shiro Kumano, K. Otsuka
Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a physical behavior that plays an important role in transferring the speaking turn between humans. Recently, a line of research has focused on the relationship between turn-taking and respiration, a biological signal that conveys information about the intention or preliminary action to start to speak. It has been demonstrated that respiration and gaze behavior separately have the potential to allow predicting the next speaker and the next utterance timing in multi-party meetings. As a multimodal fusion to create models for predicting the next speaker in multi-party meetings, we integrated respiration and gaze behavior, which were extracted from different modalities and are completely different in quality, and implemented a model uses information about them to predict the next speaker at the end of an utterance. The model has a two-step processing. The first is to predict whether turn-keeping or turn-taking happens; the second is to predict the next speaker in turn-taking. We constructed prediction models with either respiration or gaze behavior and with both respiration and gaze behaviors as features and compared their performance. The results suggest that the model with both respiration and gaze behaviors performs better than the one using only respiration or gaze behavior. It is revealed that multimodal fusion using respiration and gaze behavior is effective for predicting the next speaker in multi-party meetings. It was found that gaze behavior is more useful for predicting turn-keeping/turn-taking than respiration and that respiration is more useful for predicting the next speaker in turn-taking.
最近,利用非语言行为来预测轮流情况的技术受到了很多关注,比如谁将是下一个发言者,以及在多人会议中下一个发言的时机。人们早就知道,凝视是一种身体行为,在人与人之间的说话转换中起着重要作用。最近,一系列研究集中在轮流和呼吸之间的关系上,呼吸是一种生物信号,传达了开始说话的意图或初步行动的信息。已经证明,在多方会议中,呼吸和凝视行为分别具有预测下一位发言者和下一个发言时间的潜力。本文采用多模态融合的方法,将呼吸行为和凝视行为这两种质量完全不同的模态融合在一起,实现了一种基于呼吸行为和凝视行为的多模态融合模型,用于预测多方会议中下一个说话人。该模型有两步处理。第一个是预测是否发生了轮流或轮流;二是在轮流发言时预测下一位发言者。我们构建了呼吸或凝视行为的预测模型,并将呼吸和凝视行为作为特征,并比较了它们的性能。结果表明,结合呼吸和凝视行为的模型比只结合呼吸和凝视行为的模型效果更好。研究表明,利用呼吸和注视行为的多模态融合可以有效地预测多方会议中的下一位发言者。研究发现,凝视行为比呼吸更有助于预测轮位保持/轮位轮换,而呼吸作用更有助于预测下一个说话人的轮位轮换。
{"title":"Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings","authors":"Ryo Ishii, Shiro Kumano, K. Otsuka","doi":"10.1145/2818346.2820755","DOIUrl":"https://doi.org/10.1145/2818346.2820755","url":null,"abstract":"Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a physical behavior that plays an important role in transferring the speaking turn between humans. Recently, a line of research has focused on the relationship between turn-taking and respiration, a biological signal that conveys information about the intention or preliminary action to start to speak. It has been demonstrated that respiration and gaze behavior separately have the potential to allow predicting the next speaker and the next utterance timing in multi-party meetings. As a multimodal fusion to create models for predicting the next speaker in multi-party meetings, we integrated respiration and gaze behavior, which were extracted from different modalities and are completely different in quality, and implemented a model uses information about them to predict the next speaker at the end of an utterance. The model has a two-step processing. The first is to predict whether turn-keeping or turn-taking happens; the second is to predict the next speaker in turn-taking. We constructed prediction models with either respiration or gaze behavior and with both respiration and gaze behaviors as features and compared their performance. The results suggest that the model with both respiration and gaze behaviors performs better than the one using only respiration or gaze behavior. It is revealed that multimodal fusion using respiration and gaze behavior is effective for predicting the next speaker in multi-party meetings. It was found that gaze behavior is more useful for predicting turn-keeping/turn-taking than respiration and that respiration is more useful for predicting the next speaker in turn-taking.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91535789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A Distributed Architecture for Interacting with NAO 与NAO交互的分布式体系结构
Fabien Badeig, Quentin Pelorson, S. Arias, Vincent Drouard, I. D. Gebru, Xiaofei Li, Georgios D. Evangelidis, R. Horaud
One of the main applications of the humanoid robot NAO - a small robot companion - is human-robot interaction (HRI). NAO is particularly well suited for HRI applications because of its design, hardware specifications, programming capabilities, and affordable cost. Indeed, NAO can stand up, walk, wander, dance, play soccer, sit down, recognize and grasp simple objects, detect and identify people, localize sounds, understand some spoken words, engage itself in simple and goal-directed dialogs, and synthesize speech. This is made possible due to the robot's 24 degree-of-freedom articulated structure (body, legs, feet, arms, hands, head, etc.), motors, cameras, microphones, etc., as well as to its on-board computing hardware and embedded software, e.g., robot motion control. Nevertheless, the current NAO configuration has two drawbacks that restrict the complexity of interactive behaviors that could potentially be implemented. Firstly, the on-board computing resources are inherently limited, which implies that it is difficult to implement sophisticated computer vision and audio signal analysis algorithms required by advanced interactive tasks. Secondly, programming new robot functionalities currently implies the development of embedded software, which is a difficult task in its own right necessitating specialized knowledge. The vast majority of HRI practitioners may not have this kind of expertise and hence they cannot easily and quickly implement their ideas, carry out thorough experimental validations, and design proof-of-concept demonstrators. We have developed a distributed software architecture that attempts to overcome these two limitations. Broadly speaking, NAO's on-board computing resources are augmented with external computing resources. The latter is a computer platform with its CPUs, GPUs, memory, operating system, libraries, software packages, internet access, etc. This configuration enables easy and fast development in Matlab, C, C++, or Python. Moreover, it allows the user to combine on-board libraries (motion control, face detection, etc.) with external toolboxes, e.g., OpenCv.
人形机器人NAO是一种小型机器人伴侣,其主要应用之一是人机交互(HRI)。由于其设计、硬件规格、编程能力和可承受的成本,NAO特别适合HRI应用程序。事实上,NAO可以站立、行走、漫步、跳舞、踢足球、坐下、识别和抓住简单的物体、检测和识别人、定位声音、理解一些口语单词、进行简单和目标导向的对话,以及合成语音。这是由于机器人的24自由度铰接结构(身体,腿,脚,手臂,手,头等),电机,相机,麦克风等,以及其机载计算硬件和嵌入式软件,例如,机器人运动控制。然而,当前的NAO配置有两个缺点,限制了可能实现的交互行为的复杂性。首先,机载计算资源本身有限,难以实现高级交互任务所需的复杂计算机视觉和音频信号分析算法。其次,编程新的机器人功能目前意味着嵌入式软件的开发,这本身就是一项艰巨的任务,需要专门的知识。绝大多数HRI从业者可能没有这种专业知识,因此他们无法轻松快速地实现他们的想法,进行彻底的实验验证,并设计概念验证演示。我们已经开发了一种分布式软件架构,试图克服这两个限制。一般来说,NAO的机载计算资源是通过外部计算资源进行扩充的。后者是一个计算机平台,包括cpu、gpu、内存、操作系统、库、软件包、互联网接入等。此配置可以在Matlab, C, c++或Python中轻松快速地进行开发。此外,它允许用户将机载库(运动控制,人脸检测等)与外部工具箱(例如OpenCv)结合起来。
{"title":"A Distributed Architecture for Interacting with NAO","authors":"Fabien Badeig, Quentin Pelorson, S. Arias, Vincent Drouard, I. D. Gebru, Xiaofei Li, Georgios D. Evangelidis, R. Horaud","doi":"10.1145/2818346.2823303","DOIUrl":"https://doi.org/10.1145/2818346.2823303","url":null,"abstract":"One of the main applications of the humanoid robot NAO - a small robot companion - is human-robot interaction (HRI). NAO is particularly well suited for HRI applications because of its design, hardware specifications, programming capabilities, and affordable cost. Indeed, NAO can stand up, walk, wander, dance, play soccer, sit down, recognize and grasp simple objects, detect and identify people, localize sounds, understand some spoken words, engage itself in simple and goal-directed dialogs, and synthesize speech. This is made possible due to the robot's 24 degree-of-freedom articulated structure (body, legs, feet, arms, hands, head, etc.), motors, cameras, microphones, etc., as well as to its on-board computing hardware and embedded software, e.g., robot motion control. Nevertheless, the current NAO configuration has two drawbacks that restrict the complexity of interactive behaviors that could potentially be implemented. Firstly, the on-board computing resources are inherently limited, which implies that it is difficult to implement sophisticated computer vision and audio signal analysis algorithms required by advanced interactive tasks. Secondly, programming new robot functionalities currently implies the development of embedded software, which is a difficult task in its own right necessitating specialized knowledge. The vast majority of HRI practitioners may not have this kind of expertise and hence they cannot easily and quickly implement their ideas, carry out thorough experimental validations, and design proof-of-concept demonstrators. We have developed a distributed software architecture that attempts to overcome these two limitations. Broadly speaking, NAO's on-board computing resources are augmented with external computing resources. The latter is a computer platform with its CPUs, GPUs, memory, operating system, libraries, software packages, internet access, etc. This configuration enables easy and fast development in Matlab, C, C++, or Python. Moreover, it allows the user to combine on-board libraries (motion control, face detection, etc.) with external toolboxes, e.g., OpenCv.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76388108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploiting Multimodal Affect and Semantics to Identify Politically Persuasive Web Videos 利用多模态情感和语义学识别具有政治说服力的网络视频
Behjat Siddiquie, Dave Chisholm, Ajay Divakaran
We introduce the task of automatically classifying politically persuasive web videos and propose a highly effective multi-modal approach for this task. We extract audio, visual, and textual features that attempt to capture affect and semantics in the audio-visual content and sentiment in the viewers' comments. We demonstrate that each of the feature modalities can be used to classify politically persuasive content, and that fusing them leads to the best performance. We also perform experiments to examine human accuracy and inter-coder reliability for this task and show that our best automatic classifier slightly outperforms average human performance. Finally we show that politically persuasive videos generate more strongly negative viewer comments than non-persuasive videos and analyze how affective content can be used to predict viewer reactions.
我们介绍了自动分类具有政治说服力的网络视频的任务,并提出了一种高效的多模态方法。我们提取音频、视觉和文本特征,试图捕捉视听内容中的情感和语义以及观众评论中的情感。我们证明了每个特征模态都可以用来对具有政治说服力的内容进行分类,并且融合它们会产生最佳性能。我们还进行了实验来检查人类对这项任务的准确性和编码器间的可靠性,并表明我们最好的自动分类器略微优于人类的平均性能。最后,我们证明了政治说服性视频比非说服性视频产生更强烈的负面观众评论,并分析了如何使用情感内容来预测观众的反应。
{"title":"Exploiting Multimodal Affect and Semantics to Identify Politically Persuasive Web Videos","authors":"Behjat Siddiquie, Dave Chisholm, Ajay Divakaran","doi":"10.1145/2818346.2820732","DOIUrl":"https://doi.org/10.1145/2818346.2820732","url":null,"abstract":"We introduce the task of automatically classifying politically persuasive web videos and propose a highly effective multi-modal approach for this task. We extract audio, visual, and textual features that attempt to capture affect and semantics in the audio-visual content and sentiment in the viewers' comments. We demonstrate that each of the feature modalities can be used to classify politically persuasive content, and that fusing them leads to the best performance. We also perform experiments to examine human accuracy and inter-coder reliability for this task and show that our best automatic classifier slightly outperforms average human performance. Finally we show that politically persuasive videos generate more strongly negative viewer comments than non-persuasive videos and analyze how affective content can be used to predict viewer reactions.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"505 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77345738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
MPHA: A Personal Hearing Doctor Based on Mobile Devices MPHA:基于移动设备的个人听力医生
Yu-Hao Wu, Jia Jia, Wai-Kim Leung, Yejun Liu, Lianhong Cai
As more and more people inquire to know their hearing level condition, audiometry is becoming increasingly important. However, traditional audiometric method requires the involvement of audiometers, which are very expensive and time consuming. In this paper, we present mobile personal hearing assessment (MPHA), a novel interactive mode for testing hearing level based on mobile devices. MPHA, 1) provides a general method to calibrate sound intensity for mobile devices to guarantee the reliability and validity of the audiometry system; 2) designs an audiometric correction algorithm for the real noisy audiometric environment. The experimental results show that MPHA is reliable and valid compared with conventional audiometric assessment.
随着越来越多的人想要了解自己的听力水平,测听变得越来越重要。然而,传统的听力测量方法需要使用听力计,这是非常昂贵和耗时的。本文提出了一种基于移动设备的交互式听力水平测试模式——移动个人听力评估(MPHA)。MPHA, 1)提供了一种通用的校准移动设备声强的方法,保证了测听系统的可靠性和有效性;2)针对真实噪声环境设计了一种听觉校正算法。实验结果表明,与传统的听力评估相比,MPHA是可靠和有效的。
{"title":"MPHA: A Personal Hearing Doctor Based on Mobile Devices","authors":"Yu-Hao Wu, Jia Jia, Wai-Kim Leung, Yejun Liu, Lianhong Cai","doi":"10.1145/2818346.2820753","DOIUrl":"https://doi.org/10.1145/2818346.2820753","url":null,"abstract":"As more and more people inquire to know their hearing level condition, audiometry is becoming increasingly important. However, traditional audiometric method requires the involvement of audiometers, which are very expensive and time consuming. In this paper, we present mobile personal hearing assessment (MPHA), a novel interactive mode for testing hearing level based on mobile devices. MPHA, 1) provides a general method to calibrate sound intensity for mobile devices to guarantee the reliability and validity of the audiometry system; 2) designs an audiometric correction algorithm for the real noisy audiometric environment. The experimental results show that MPHA is reliable and valid compared with conventional audiometric assessment.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79184101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Micro-opinion Sentiment Intensity Analysis and Summarization in Online Videos 网络视频中的微意见情绪强度分析与总结
Amir Zadeh
There has been substantial progress in the field of text based sentiment analysis but little effort has been made to incorporate other modalities. Previous work in sentiment analysis has shown that using multimodal data yields to more accurate models of sentiment. Efforts have been made towards expressing sentiment as a spectrum of intensity rather than just positive or negative. Such models are useful not only for detection of positivity or negativity, but also giving out a score of how positive or negative a statement is. Based on the state of the art studies in sentiment analysis, prediction in terms of sentiment score is still far from accurate, even in large datasets [27]. Another challenge in sentiment analysis is dealing with small segments or micro opinions as they carry less context than large segments thus making analysis of the sentiment harder. This paper presents a Ph.D. thesis shaped towards comprehensive studies in multimodal micro-opinion sentiment intensity analysis.
在基于文本的情感分析领域已经取得了实质性进展,但很少努力纳入其他模式。先前在情绪分析方面的工作表明,使用多模态数据可以产生更准确的情绪模型。人们努力将情绪表达为一系列的强度,而不仅仅是积极或消极。这些模型不仅对检测积极或消极有用,而且还对陈述的积极或消极程度进行评分。基于情感分析的最新研究,即使在大型数据集中,基于情感得分的预测仍然远远不够准确[27]。情绪分析的另一个挑战是处理小片段或微观点,因为它们比大片段具有更少的背景,从而使情绪分析变得更加困难。本文提出了一篇针对多模态微意见情绪强度分析的综合研究的博士论文。
{"title":"Micro-opinion Sentiment Intensity Analysis and Summarization in Online Videos","authors":"Amir Zadeh","doi":"10.1145/2818346.2823317","DOIUrl":"https://doi.org/10.1145/2818346.2823317","url":null,"abstract":"There has been substantial progress in the field of text based sentiment analysis but little effort has been made to incorporate other modalities. Previous work in sentiment analysis has shown that using multimodal data yields to more accurate models of sentiment. Efforts have been made towards expressing sentiment as a spectrum of intensity rather than just positive or negative. Such models are useful not only for detection of positivity or negativity, but also giving out a score of how positive or negative a statement is. Based on the state of the art studies in sentiment analysis, prediction in terms of sentiment score is still far from accurate, even in large datasets [27]. Another challenge in sentiment analysis is dealing with small segments or micro opinions as they carry less context than large segments thus making analysis of the sentiment harder. This paper presents a Ph.D. thesis shaped towards comprehensive studies in multimodal micro-opinion sentiment intensity analysis.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74035397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1