首页 > 最新文献

Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
LieCatcher: Game Framework for Collecting Human Judgments of Deceptive Speech LieCatcher:收集人类对欺骗性言语判断的游戏框架
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421166
Sarah Ita Levitan, James Shin, Ivy Chen, Julia Hirschberg
Humans are notoriously poor at detecting deception --- most are worse than chance. To address this issue we have developed LieCatcher, a single-player web-based Game With A Purpose (GWAP) that allows players to assess their lie detection skills while providing human judgments of deceptive speech. Players listen to audio recordings drawn from a corpus of deceptive and non-deceptive interview dialogues, and guess if the speaker is lying or telling the truth. They are awarded points for correct guesses and at the end of the game they receive a score summarizing their performance at lie detection. We present the game design and implementation, and describe a crowdsourcing experiment conducted to study perceived deception.
人类在识别欺骗方面的能力是出了名的差——大多数都比运气差。为了解决这个问题,我们开发了一款基于网页的单人游戏《寻谎者》(LieCatcher),它允许玩家评估自己的测谎技能,同时提供人类对欺骗性言论的判断。参与者听一段取自欺骗性和非欺骗性访谈对话的录音,然后猜测说话者是在撒谎还是在说实话。他们会因猜对而获得分数,在游戏结束时,他们会得到一个分数,总结他们在测谎中的表现。我们介绍了游戏的设计和实现,并描述了一个用于研究感知欺骗的众包实验。
{"title":"LieCatcher: Game Framework for Collecting Human Judgments of Deceptive Speech","authors":"Sarah Ita Levitan, James Shin, Ivy Chen, Julia Hirschberg","doi":"10.1145/3382507.3421166","DOIUrl":"https://doi.org/10.1145/3382507.3421166","url":null,"abstract":"Humans are notoriously poor at detecting deception --- most are worse than chance. To address this issue we have developed LieCatcher, a single-player web-based Game With A Purpose (GWAP) that allows players to assess their lie detection skills while providing human judgments of deceptive speech. Players listen to audio recordings drawn from a corpus of deceptive and non-deceptive interview dialogues, and guess if the speaker is lying or telling the truth. They are awarded points for correct guesses and at the end of the game they receive a score summarizing their performance at lie detection. We present the game design and implementation, and describe a crowdsourcing experiment conducted to study perceived deception.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129977530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Musical Multimodal Interaction: From Bodies to Ecologies 音乐多模态互动:从身体到生态
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3419444
Atau Tanaka
Musical performance can be thought of in multimodal terms - physical interaction with musical instruments produces sound output, often while the performer is visually reading a score. Digital Musical Instrument (DMI) design merges tenets of HCI and musical instrument practice. Audiovisual performance and other forms of multimedia might benefit from multimodal thinking. This keynote revisits two decades of interactive music practice that has paralleled the development of the field of multimodal interaction research. The BioMuse was an early digital musical instrument system using EMG muscle sensing that was extended by a second mode of sensing, allowing effort and position to be two complementary modalities [1]. The Haptic Wave applied principles of cross-modal information display to create a haptic audio editor enabling visually impaired audio producers to 'feel' audio waveforms they could not see in a graphical user interface [2]. VJ culture extends the idea of music DJs to create audiovisual cultural experiences. AVUIs were a set of creative coding tools that enabled the convergence of performance UI and creative visual output [3]. The Orchestra of Rocks is a continuing collaboration with visual artist Uta Kogelsberger that has manifested itself through physical and virtual forms - allowing multimodality over time [4]. Be it a physical exhibition in a gallery or audio reactive 3D animation on YouTube 360, the multiple modes in which an artwork is articulated support its original conceptual foundations. These four projects situate multimodal interaction at the heart of artistic research.
音乐表演可以用多模态的术语来考虑——与乐器的物理互动产生声音输出,通常是在表演者视觉上阅读乐谱的时候。数字乐器(DMI)设计融合了HCI和乐器实践的原则。视听表演和其他形式的多媒体可能受益于多模态思维。这个主题回顾了二十年来与多模态互动研究领域的发展平行的互动音乐实践。BioMuse是一种早期的数字乐器系统,它使用肌电图肌肉感应,并通过第二种感应模式进行扩展,使力度和位置成为两种互补的模式[1]。Haptic Wave应用了跨模态信息显示的原理来创建一个触觉音频编辑器,使视障音频制作者能够“感觉”到他们在图形用户界面中看不到的音频波形[2]。VJ文化将音乐dj的理念延伸到创造视听文化体验。avui是一套创造性的编码工具,实现了性能UI和创造性视觉输出的融合[3]。《岩石管弦乐队》是与视觉艺术家Uta Kogelsberger的持续合作,通过物理和虚拟形式表现出来,随着时间的推移允许多模态[4]。无论是画廊中的实体展览还是YouTube 360上的音频反应3D动画,艺术作品的多种模式都支持其原始概念基础。这四个项目将多模式互动置于艺术研究的核心。
{"title":"Musical Multimodal Interaction: From Bodies to Ecologies","authors":"Atau Tanaka","doi":"10.1145/3382507.3419444","DOIUrl":"https://doi.org/10.1145/3382507.3419444","url":null,"abstract":"Musical performance can be thought of in multimodal terms - physical interaction with musical instruments produces sound output, often while the performer is visually reading a score. Digital Musical Instrument (DMI) design merges tenets of HCI and musical instrument practice. Audiovisual performance and other forms of multimedia might benefit from multimodal thinking. This keynote revisits two decades of interactive music practice that has paralleled the development of the field of multimodal interaction research. The BioMuse was an early digital musical instrument system using EMG muscle sensing that was extended by a second mode of sensing, allowing effort and position to be two complementary modalities [1]. The Haptic Wave applied principles of cross-modal information display to create a haptic audio editor enabling visually impaired audio producers to 'feel' audio waveforms they could not see in a graphical user interface [2]. VJ culture extends the idea of music DJs to create audiovisual cultural experiences. AVUIs were a set of creative coding tools that enabled the convergence of performance UI and creative visual output [3]. The Orchestra of Rocks is a continuing collaboration with visual artist Uta Kogelsberger that has manifested itself through physical and virtual forms - allowing multimodality over time [4]. Be it a physical exhibition in a gallery or audio reactive 3D animation on YouTube 360, the multiple modes in which an artwork is articulated support its original conceptual foundations. These four projects situate multimodal interaction at the heart of artistic research.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116274243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facilitating Flexible Force Feedback Design with Feelix 利用Feelix促进灵活的力反馈设计
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418819
Anke van Oosterhout, M. Bruns, Eve E. Hoggan
In the last decade, haptic actuators have improved in quality and efficiency, enabling easier implementation in user interfaces. One of the next steps towards a mature haptics field is a larger and more diverse toolset that enables designers and novices to explore with the design and implementation of haptic feedback in their projects. In this paper, we look at several design projects that utilize haptic force feedback to aid interaction between the user and product. We analysed the process interaction designers went through when developing their haptic user interfaces. Based on our insights, we identified requirements for a haptic force feedback authoring tool. We discuss how these requirements are addressed by 'Feelix', a tool that supports sketching and refinement of haptic force feedback effects.
在过去的十年中,触觉执行器在质量和效率方面都有了提高,使得在用户界面中更容易实现。迈向成熟触觉领域的下一步是一个更大、更多样化的工具集,使设计师和新手能够在他们的项目中探索触觉反馈的设计和实现。在本文中,我们着眼于几个设计项目,利用触觉力反馈来帮助用户和产品之间的交互。我们分析了交互设计师在开发触觉用户界面时所经历的过程。基于我们的见解,我们确定了触觉力反馈创作工具的需求。我们讨论如何通过“Feelix”解决这些要求,这是一个支持草图和触觉力反馈效果改进的工具。
{"title":"Facilitating Flexible Force Feedback Design with Feelix","authors":"Anke van Oosterhout, M. Bruns, Eve E. Hoggan","doi":"10.1145/3382507.3418819","DOIUrl":"https://doi.org/10.1145/3382507.3418819","url":null,"abstract":"In the last decade, haptic actuators have improved in quality and efficiency, enabling easier implementation in user interfaces. One of the next steps towards a mature haptics field is a larger and more diverse toolset that enables designers and novices to explore with the design and implementation of haptic feedback in their projects. In this paper, we look at several design projects that utilize haptic force feedback to aid interaction between the user and product. We analysed the process interaction designers went through when developing their haptic user interfaces. Based on our insights, we identified requirements for a haptic force feedback authoring tool. We discuss how these requirements are addressed by 'Feelix', a tool that supports sketching and refinement of haptic force feedback effects.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127657497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Punchline Detection using Context-Aware Hierarchical Multimodal Fusion 使用上下文感知分层多模态融合的笑点检测
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418891
Akshat Choube, M. Soleymani
Humor has a history as old as humanity. Humor often induces laughter and elicits amusement and engagement. Humorous behavior involves behavior manifested in different modalities including language, voice tone, and gestures. Thus, automatic understanding of humorous behavior requires multimodal behavior analysis. Humor detection is a well-established problem in Natural Language Processing but its multimodal analysis is less explored. In this paper, we present a context-aware hierarchical fusion network for multimodal punchline detection. The proposed neural architecture first fuses the modalities two by two and then fuses all three modalities. The network also models the context of the punchline using Gated Recurrent Unit(s). The model's performance is evaluated on UR-FUNNY database yielding state-of-the-art performance.
幽默的历史和人类一样古老。幽默常常引起笑声,使人感到愉快和愉快。幽默行为包括不同形式的行为,包括语言、语调和手势。因此,对幽默行为的自动理解需要多模态行为分析。幽默检测是自然语言处理中一个公认的问题,但其多模态分析研究较少。在本文中,我们提出了一个上下文感知的分层融合网络,用于多模态笑点检测。提出的神经结构首先对两个模态进行融合,然后对所有三个模态进行融合。该网络还使用门控循环单元对笑点的上下文进行建模。该模型的性能在UR-FUNNY数据库上进行评估,产生最先进的性能。
{"title":"Punchline Detection using Context-Aware Hierarchical Multimodal Fusion","authors":"Akshat Choube, M. Soleymani","doi":"10.1145/3382507.3418891","DOIUrl":"https://doi.org/10.1145/3382507.3418891","url":null,"abstract":"Humor has a history as old as humanity. Humor often induces laughter and elicits amusement and engagement. Humorous behavior involves behavior manifested in different modalities including language, voice tone, and gestures. Thus, automatic understanding of humorous behavior requires multimodal behavior analysis. Humor detection is a well-established problem in Natural Language Processing but its multimodal analysis is less explored. In this paper, we present a context-aware hierarchical fusion network for multimodal punchline detection. The proposed neural architecture first fuses the modalities two by two and then fuses all three modalities. The network also models the context of the punchline using Gated Recurrent Unit(s). The model's performance is evaluated on UR-FUNNY database yielding state-of-the-art performance.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116166926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Analyzing Nonverbal Behaviors along with Praising 在赞美的同时分析非语言行为
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418868
Toshiki Onishi, Arisa Yamauchi, Ryo Ishii, Y. Aono, Akihiro Miyata
In this work, as a first attempt to analyze the relationship between praising skills and human behavior in dialogue, we focus on head and face behavior. We create a new dialogue corpus including face and head behavior information of persons who give praise (praiser) and receive praise (receiver) and the degree of success of praising (praising score). We also create a machine learning model that uses features related to head and face behavior to estimate praising score, clarify which features of the praiser and receiver are important in estimating praising score. The analysis results showed that features of the praiser and receiver are important in estimating praising score and that features related to utterance, head, gaze, and chin were important. The analysis of the features of high importance revealed that the praiser and receiver should face each other without turning their heads to the left or right, and the longer the praiser's utterance, the more successful the praising.
在这项工作中,作为第一次尝试分析赞美技巧与对话中人类行为之间的关系,我们将重点放在头部和面部行为上。我们创建了一个新的对话语料库,包括表扬者(赞美者)和接受者(接受者)的面部和头部行为信息以及表扬的成功程度(表扬分数)。我们还创建了一个机器学习模型,该模型使用与头部和面部行为相关的特征来估计表扬分数,阐明了表扬者和接受者的哪些特征在估计表扬分数时是重要的。分析结果表明,赞美者和接受者的特征对评价分数有重要影响,言语、头部、目光和下巴等特征对评价分数有重要影响。对高重要性特征的分析表明,赞美者和接受者应该面对对方,不要把头转向左右,赞美者的话语越长,赞美越成功。
{"title":"Analyzing Nonverbal Behaviors along with Praising","authors":"Toshiki Onishi, Arisa Yamauchi, Ryo Ishii, Y. Aono, Akihiro Miyata","doi":"10.1145/3382507.3418868","DOIUrl":"https://doi.org/10.1145/3382507.3418868","url":null,"abstract":"In this work, as a first attempt to analyze the relationship between praising skills and human behavior in dialogue, we focus on head and face behavior. We create a new dialogue corpus including face and head behavior information of persons who give praise (praiser) and receive praise (receiver) and the degree of success of praising (praising score). We also create a machine learning model that uses features related to head and face behavior to estimate praising score, clarify which features of the praiser and receiver are important in estimating praising score. The analysis results showed that features of the praiser and receiver are important in estimating praising score and that features related to utterance, head, gaze, and chin were important. The analysis of the features of high importance revealed that the praiser and receiver should face each other without turning their heads to the left or right, and the longer the praiser's utterance, the more successful the praising.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126012765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Leniency to those who confess?: Predicting the Legal Judgement via Multi-Modal Analysis 对认罪的人从宽处理?:用多模态分析预测法律判决
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418893
Liang Yang, Jingjie Zeng, Tao Peng, Xi Luo, Jinghui Zhang, Hongfei Lin
The Legal Judgement Prediction (LJP) is now under the spotlight. And it usually consists of multiple sub-tasks, such as penalty prediction (fine and imprisonment) and the prediction of articles of law. For penalty prediction, they are often closely related to the trial process, especially the attitude analysis of criminal suspects, which will influence the judgment of the presiding judge to some extent. In this paper, we firstly construct a multi-modal dataset with 517 cases of intentional assault, which contains trial information as well as the attitude of the suspect. Then, we explore the relationship between suspect`s attitude and term of imprisonment. Finally, we use the proposed multi-modal model to predict the suspect's attitude, and compare it with several strong baselines. Our experimental results show that the attitude of the criminal suspect is closely related to the penalty prediction, which provides a new perspective for LJP.
法律判决预测(LJP)目前备受关注。它通常由多个子任务组成,如刑罚预测(罚款和监禁)和法律条文预测。对于刑罚预测,往往与审判过程密切相关,尤其是犯罪嫌疑人的态度分析,会在一定程度上影响审判长的判决。本文首先构建了包含517起故意伤害案的多模态数据集,该数据集包含审判信息和犯罪嫌疑人的态度。然后,我们探讨了犯罪嫌疑人态度与刑期的关系。最后,我们使用提出的多模态模型来预测嫌疑人的态度,并将其与几个强基线进行比较。我们的实验结果表明,犯罪嫌疑人的态度与刑罚预测密切相关,这为LJP提供了一个新的视角。
{"title":"Leniency to those who confess?: Predicting the Legal Judgement via Multi-Modal Analysis","authors":"Liang Yang, Jingjie Zeng, Tao Peng, Xi Luo, Jinghui Zhang, Hongfei Lin","doi":"10.1145/3382507.3418893","DOIUrl":"https://doi.org/10.1145/3382507.3418893","url":null,"abstract":"The Legal Judgement Prediction (LJP) is now under the spotlight. And it usually consists of multiple sub-tasks, such as penalty prediction (fine and imprisonment) and the prediction of articles of law. For penalty prediction, they are often closely related to the trial process, especially the attitude analysis of criminal suspects, which will influence the judgment of the presiding judge to some extent. In this paper, we firstly construct a multi-modal dataset with 517 cases of intentional assault, which contains trial information as well as the attitude of the suspect. Then, we explore the relationship between suspect`s attitude and term of imprisonment. Finally, we use the proposed multi-modal model to predict the suspect's attitude, and compare it with several strong baselines. Our experimental results show that the attitude of the criminal suspect is closely related to the penalty prediction, which provides a new perspective for LJP.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121665215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Zero-Shot Learning for Gesture Recognition 零射击学习手势识别
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421161
Naveen Madapana
Zero-Shot Learning (ZSL) is a new paradigm in machine learning that aims to recognize the classes that are not present in the training data. Hence, this paradigm is capable of comprehending the categories that were never seen before. While deep learning has pushed the limits of unseen object recognition, ZSL for temporal problems such as unfamiliar gesture recognition (referred to as ZSGL) remain unexplored. ZSGL has the potential to result in efficient human-machine interfaces that can recognize and understand the spontaneous and conversational gestures of humans. In this regard, the objective of this work is to conceptualize, model and develop a framework to tackle ZSGL problems. The first step in the pipeline is to develop a database of gesture attributes that are representative of a range of categories. Next, a deep architecture consisting of convolutional and recurrent layers is proposed to jointly optimize the semantic and classification losses. Lastly, rigorous experiments are performed to compare the proposed model with respect to existing ZSL models on CGD 2013 and MSRC-12 datasets. In our preliminary work, we identified a list of 64 discriminative attributes related to gestures' morphological characteristics. Our approach yields an unseen class accuracy of (41%) which outperforms the state-of-the-art approaches by a considerable margin. Future work involves the following: 1. Modifying the existing architecture in order to improve the ZSL accuracy, 2. Augmenting the database of attributes to incorporate semantic properties, 3. Addressing the issue of data imbalance which is inherent to ZSL problems, and 4. Expanding this research to other domains such as surgeme and action recognition.
零射击学习(Zero-Shot Learning, ZSL)是机器学习的一种新范式,旨在识别训练数据中不存在的类。因此,这种范式能够理解以前从未见过的类别。虽然深度学习已经突破了看不见的物体识别的极限,但ZSL用于不熟悉的手势识别(简称ZSGL)等时间问题仍未被探索。ZSGL有可能产生高效的人机界面,可以识别和理解人类的自发和会话手势。在这方面,这项工作的目标是概念化、建模和开发一个框架来解决ZSGL问题。该流程的第一步是开发一个代表一系列类别的手势属性数据库。其次,提出了一种由卷积层和循环层组成的深度体系结构,共同优化语义和分类损失。最后,在CGD 2013和MSRC-12数据集上进行了严格的实验,将所提出的模型与现有的ZSL模型进行了比较。在我们的初步工作中,我们确定了64个与手势形态特征相关的判别属性。我们的方法产生了一个前所未见的类准确率(41%),这比最先进的方法要好得多。今后的工作包括:1 .改进现有结构以提高ZSL精度;2 .扩充属性数据库以纳入语义属性;解决数据不平衡的问题,这是ZSL问题固有的;将此研究扩展到其他领域,如涌浪和动作识别。
{"title":"Zero-Shot Learning for Gesture Recognition","authors":"Naveen Madapana","doi":"10.1145/3382507.3421161","DOIUrl":"https://doi.org/10.1145/3382507.3421161","url":null,"abstract":"Zero-Shot Learning (ZSL) is a new paradigm in machine learning that aims to recognize the classes that are not present in the training data. Hence, this paradigm is capable of comprehending the categories that were never seen before. While deep learning has pushed the limits of unseen object recognition, ZSL for temporal problems such as unfamiliar gesture recognition (referred to as ZSGL) remain unexplored. ZSGL has the potential to result in efficient human-machine interfaces that can recognize and understand the spontaneous and conversational gestures of humans. In this regard, the objective of this work is to conceptualize, model and develop a framework to tackle ZSGL problems. The first step in the pipeline is to develop a database of gesture attributes that are representative of a range of categories. Next, a deep architecture consisting of convolutional and recurrent layers is proposed to jointly optimize the semantic and classification losses. Lastly, rigorous experiments are performed to compare the proposed model with respect to existing ZSL models on CGD 2013 and MSRC-12 datasets. In our preliminary work, we identified a list of 64 discriminative attributes related to gestures' morphological characteristics. Our approach yields an unseen class accuracy of (41%) which outperforms the state-of-the-art approaches by a considerable margin. Future work involves the following: 1. Modifying the existing architecture in order to improve the ZSL accuracy, 2. Augmenting the database of attributes to incorporate semantic properties, 3. Addressing the issue of data imbalance which is inherent to ZSL problems, and 4. Expanding this research to other domains such as surgeme and action recognition.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121710632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Understanding Applicants' Reactions to Asynchronous Video Interviews Through Self-reports and Nonverbal Cues 通过自我报告和非语言线索了解应聘者对非同步视频面试的反应
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418869
Skanda Muralidhar, E. Kleinlogel, E. Mayor, Adrian Bangerter, M. S. Mast, D. Gática-Pérez
Asynchronous video interviews (AVIs) are increasingly used by organizations in their hiring process. In this mode of interviewing, the applicants are asked to record their responses to predefined interview questions using a webcam via an online platform. AVIs have increased usage due to employers' perceived benefits in terms of costs and scale. However, little research has been conducted regarding applicants' reactions to these new interview methods. In this work, we investigate applicants' reactions to an AVI platform using self-reported measures previously validated in psychology literature. We also investigate the connections of these measures with nonverbal behavior displayed during the interviews. We find that participants who found the platform creepy and had concerns about privacy reported lower interview performance compared to participants who did not have such concerns. We also observe weak correlations between nonverbal cues displayed and these self-reported measures. Finally, inference experiments achieve overall low-performance w.r.t. to explaining applicants' reactions. Overall, our results reveal that participants who are not at ease with AVIs (i.e., high creepy ambiguity score) might be unfairly penalized. This has implications for improved hiring practices using AVIs.
异步视频面试(AVIs)越来越多地被组织在招聘过程中使用。在这种面试模式中,申请人被要求通过在线平台使用网络摄像头记录他们对预定面试问题的回答。由于雇主在成本和规模方面感受到的好处,AVIs的使用率增加了。然而,很少有关于应聘者对这些新的面试方法的反应的研究。在这项工作中,我们调查了申请人对AVI平台的反应,使用了先前在心理学文献中得到验证的自我报告措施。我们还调查了这些措施与非语言行为在访谈中显示的联系。我们发现,与没有这种担忧的参与者相比,那些觉得这个平台令人毛骨悚然、担心隐私的参与者在面试中表现较差。我们还观察到非语言提示和这些自我报告的测量之间的弱相关性。最后,推理实验在解释申请人的反应方面达到了整体低绩效的w.r.t.。总的来说,我们的研究结果表明,那些对AVIs不放心的参与者(即,高令人毛骨悚然的模糊得分)可能会受到不公平的惩罚。这对使用AVIs改善招聘实践具有启示意义。
{"title":"Understanding Applicants' Reactions to Asynchronous Video Interviews Through Self-reports and Nonverbal Cues","authors":"Skanda Muralidhar, E. Kleinlogel, E. Mayor, Adrian Bangerter, M. S. Mast, D. Gática-Pérez","doi":"10.1145/3382507.3418869","DOIUrl":"https://doi.org/10.1145/3382507.3418869","url":null,"abstract":"Asynchronous video interviews (AVIs) are increasingly used by organizations in their hiring process. In this mode of interviewing, the applicants are asked to record their responses to predefined interview questions using a webcam via an online platform. AVIs have increased usage due to employers' perceived benefits in terms of costs and scale. However, little research has been conducted regarding applicants' reactions to these new interview methods. In this work, we investigate applicants' reactions to an AVI platform using self-reported measures previously validated in psychology literature. We also investigate the connections of these measures with nonverbal behavior displayed during the interviews. We find that participants who found the platform creepy and had concerns about privacy reported lower interview performance compared to participants who did not have such concerns. We also observe weak correlations between nonverbal cues displayed and these self-reported measures. Finally, inference experiments achieve overall low-performance w.r.t. to explaining applicants' reactions. Overall, our results reveal that participants who are not at ease with AVIs (i.e., high creepy ambiguity score) might be unfairly penalized. This has implications for improved hiring practices using AVIs.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131770988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Speaker-Invariant Adversarial Domain Adaptation for Emotion Recognition 情绪识别的说话人不变对抗域自适应
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418813
Yufeng Yin, Baiyu Huang, Yizhen Wu, M. Soleymani
Automatic emotion recognition methods are sensitive to the variations across different datasets and their performance drops when evaluated across corpora. We can apply domain adaptation techniques e.g., Domain-Adversarial Neural Network (DANN) to mitigate this problem. Though the DANN can detect and remove the bias between corpora, the bias between speakers still remains which results in reduced performance. In this paper, we propose Speaker-Invariant Domain-Adversarial Neural Network (SIDANN) to reduce both the domain bias and the speaker bias. Specifically, based on the DANN, we add a speaker discriminator to unlearn information representing speakers' individual characteristics with a gradient reversal layer (GRL). Our experiments with multimodal data (speech, vision, and text) and the cross-domain evaluation indicate that the proposed SIDANN outperforms (+5.6% and +2.8% on average for detecting arousal and valence) the DANN model, suggesting that the SIDANN has a better domain adaptation ability than the DANN. Besides, the modality contribution analysis shows that the acoustic features are the most informative for arousal detection while the lexical features perform the best for valence detection.
自动情绪识别方法对不同数据集之间的差异很敏感,在跨语料库评估时,其性能会下降。我们可以应用领域自适应技术,如领域对抗神经网络(DANN)来缓解这个问题。虽然DANN可以检测和消除语料库之间的偏见,但说话者之间的偏见仍然存在,导致性能下降。在本文中,我们提出了演讲者-不变域-对抗神经网络(SIDANN)来减少域偏差和说话人偏差。具体而言,我们在DANN的基础上,通过梯度反转层(GRL)添加说话人鉴别器来去除代表说话人个体特征的信息。我们对多模态数据(语音、视觉和文本)的实验和跨域评估表明,所提出的SIDANN模型优于DANN模型(觉醒和价态检测平均+5.6%和+2.8%),表明SIDANN具有比DANN更好的域适应能力。此外,模态贡献分析表明,声学特征对唤醒检测的信息量最大,而词汇特征对价态检测的信息量最大。
{"title":"Speaker-Invariant Adversarial Domain Adaptation for Emotion Recognition","authors":"Yufeng Yin, Baiyu Huang, Yizhen Wu, M. Soleymani","doi":"10.1145/3382507.3418813","DOIUrl":"https://doi.org/10.1145/3382507.3418813","url":null,"abstract":"Automatic emotion recognition methods are sensitive to the variations across different datasets and their performance drops when evaluated across corpora. We can apply domain adaptation techniques e.g., Domain-Adversarial Neural Network (DANN) to mitigate this problem. Though the DANN can detect and remove the bias between corpora, the bias between speakers still remains which results in reduced performance. In this paper, we propose Speaker-Invariant Domain-Adversarial Neural Network (SIDANN) to reduce both the domain bias and the speaker bias. Specifically, based on the DANN, we add a speaker discriminator to unlearn information representing speakers' individual characteristics with a gradient reversal layer (GRL). Our experiments with multimodal data (speech, vision, and text) and the cross-domain evaluation indicate that the proposed SIDANN outperforms (+5.6% and +2.8% on average for detecting arousal and valence) the DANN model, suggesting that the SIDANN has a better domain adaptation ability than the DANN. Besides, the modality contribution analysis shows that the acoustic features are the most informative for arousal detection while the lexical features perform the best for valence detection.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132396449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
How Good is Good Enough?: The Impact of Errors in Single Person Action Classification on the Modeling of Group Interactions in Volleyball 好到什么程度才算好?:单人动作分类错误对排球运动群体互动建模的影响
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418846
Lian Beenhakker, F. Salim, D. Postma, R. V. Delden, D. Reidsma, B. Beijnum
In Human Behaviour Understanding, social interaction is often modeled on the basis of lower level action recognition. The accuracy of this recognition has an impact on the system's capability to detect the higher level social events, and thus on the usefulness of the resulting system. We model team interactions in volleyball and investigate, through simulation of typical error patterns, how one can consider the required quality (in accuracy and in allowable types of errors) of the underlying action recognition for automated volleyball monitoring. Our proposed approach simulates different patterns of errors, grounded in related work in volleyball action recognition, on top of a manually annotated ground truth to model their different impact on the interaction recognition. Our results show that this can provide a means to quantify the effect of different type of classification errors on the overall quality of the system. Our chosen volleyball use case, in the rising field of sports monitoring, also addresses specific team related challenges in such a system and how these can be visualized to grasp the interdependencies. In our use case the first layer of our system classifies actions of individual players and the second layer recognizes multiplayer exercises and complexes (i.e. sequences in rallies) to enhance training. The experiments performed for this study investigated how errors at the action recognition layer propagate and cause errors at the complexes layer. We discuss the strengths and weaknesses of the layered system to model volleyball rallies. We also give indications regarding what kind of errors are causing more problems and what choices can follow from them. In our given context we suggest that for recognition of non-Freeball actions (e.g. smash, block) it is more important to achieve a higher accuracy, which can be done at the cost of accuracy of classification of Freeball actions (which are mostly plays between team members and are more interchangable as to their role in the complexes).
在人类行为理解中,社会互动通常以较低层次的行为识别为基础。这种识别的准确性会影响系统检测更高级别社会事件的能力,从而影响最终系统的有用性。我们对排球中的团队互动进行了建模,并通过对典型错误模式的模拟,研究了如何考虑自动排球监控中潜在动作识别的所需质量(在准确性和允许的错误类型方面)。我们提出的方法模拟了不同的错误模式,以排球动作识别的相关工作为基础,在人工注释的基础上模拟它们对交互识别的不同影响。我们的结果表明,这可以提供一种方法来量化不同类型的分类错误对系统整体质量的影响。我们选择的排球用例,在体育监控的新兴领域,也解决了这样一个系统中特定的团队相关的挑战,以及如何将这些挑战可视化以掌握相互依赖关系。在我们的用例中,我们系统的第一层对单个玩家的动作进行分类,第二层识别多人练习和综合体(即集会中的序列)以加强训练。为本研究进行的实验研究了动作识别层的误差如何传播并导致复合层的误差。我们讨论了分层系统模拟排球集会的优点和缺点。我们还指出了哪些错误会导致更多的问题,以及可以从这些错误中做出哪些选择。在我们给定的环境中,我们建议对于非任意球动作(例如,扣杀,阻挡)的识别,更重要的是实现更高的准确性,这可以以牺牲任意球动作分类的准确性为代价(这主要是在团队成员之间进行的,并且在复杂的角色中更具互换性)。
{"title":"How Good is Good Enough?: The Impact of Errors in Single Person Action Classification on the Modeling of Group Interactions in Volleyball","authors":"Lian Beenhakker, F. Salim, D. Postma, R. V. Delden, D. Reidsma, B. Beijnum","doi":"10.1145/3382507.3418846","DOIUrl":"https://doi.org/10.1145/3382507.3418846","url":null,"abstract":"In Human Behaviour Understanding, social interaction is often modeled on the basis of lower level action recognition. The accuracy of this recognition has an impact on the system's capability to detect the higher level social events, and thus on the usefulness of the resulting system. We model team interactions in volleyball and investigate, through simulation of typical error patterns, how one can consider the required quality (in accuracy and in allowable types of errors) of the underlying action recognition for automated volleyball monitoring. Our proposed approach simulates different patterns of errors, grounded in related work in volleyball action recognition, on top of a manually annotated ground truth to model their different impact on the interaction recognition. Our results show that this can provide a means to quantify the effect of different type of classification errors on the overall quality of the system. Our chosen volleyball use case, in the rising field of sports monitoring, also addresses specific team related challenges in such a system and how these can be visualized to grasp the interdependencies. In our use case the first layer of our system classifies actions of individual players and the second layer recognizes multiplayer exercises and complexes (i.e. sequences in rallies) to enhance training. The experiments performed for this study investigated how errors at the action recognition layer propagate and cause errors at the complexes layer. We discuss the strengths and weaknesses of the layered system to model volleyball rallies. We also give indications regarding what kind of errors are causing more problems and what choices can follow from them. In our given context we suggest that for recognition of non-Freeball actions (e.g. smash, block) it is more important to achieve a higher accuracy, which can be done at the cost of accuracy of classification of Freeball actions (which are mostly plays between team members and are more interchangable as to their role in the complexes).","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133581966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1