首页 > 最新文献

Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Leniency to those who confess?: Predicting the Legal Judgement via Multi-Modal Analysis 对认罪的人从宽处理?:用多模态分析预测法律判决
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418893
Liang Yang, Jingjie Zeng, Tao Peng, Xi Luo, Jinghui Zhang, Hongfei Lin
The Legal Judgement Prediction (LJP) is now under the spotlight. And it usually consists of multiple sub-tasks, such as penalty prediction (fine and imprisonment) and the prediction of articles of law. For penalty prediction, they are often closely related to the trial process, especially the attitude analysis of criminal suspects, which will influence the judgment of the presiding judge to some extent. In this paper, we firstly construct a multi-modal dataset with 517 cases of intentional assault, which contains trial information as well as the attitude of the suspect. Then, we explore the relationship between suspect`s attitude and term of imprisonment. Finally, we use the proposed multi-modal model to predict the suspect's attitude, and compare it with several strong baselines. Our experimental results show that the attitude of the criminal suspect is closely related to the penalty prediction, which provides a new perspective for LJP.
法律判决预测(LJP)目前备受关注。它通常由多个子任务组成,如刑罚预测(罚款和监禁)和法律条文预测。对于刑罚预测,往往与审判过程密切相关,尤其是犯罪嫌疑人的态度分析,会在一定程度上影响审判长的判决。本文首先构建了包含517起故意伤害案的多模态数据集,该数据集包含审判信息和犯罪嫌疑人的态度。然后,我们探讨了犯罪嫌疑人态度与刑期的关系。最后,我们使用提出的多模态模型来预测嫌疑人的态度,并将其与几个强基线进行比较。我们的实验结果表明,犯罪嫌疑人的态度与刑罚预测密切相关,这为LJP提供了一个新的视角。
{"title":"Leniency to those who confess?: Predicting the Legal Judgement via Multi-Modal Analysis","authors":"Liang Yang, Jingjie Zeng, Tao Peng, Xi Luo, Jinghui Zhang, Hongfei Lin","doi":"10.1145/3382507.3418893","DOIUrl":"https://doi.org/10.1145/3382507.3418893","url":null,"abstract":"The Legal Judgement Prediction (LJP) is now under the spotlight. And it usually consists of multiple sub-tasks, such as penalty prediction (fine and imprisonment) and the prediction of articles of law. For penalty prediction, they are often closely related to the trial process, especially the attitude analysis of criminal suspects, which will influence the judgment of the presiding judge to some extent. In this paper, we firstly construct a multi-modal dataset with 517 cases of intentional assault, which contains trial information as well as the attitude of the suspect. Then, we explore the relationship between suspect`s attitude and term of imprisonment. Finally, we use the proposed multi-modal model to predict the suspect's attitude, and compare it with several strong baselines. Our experimental results show that the attitude of the criminal suspect is closely related to the penalty prediction, which provides a new perspective for LJP.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121665215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gaze Tracker Accuracy and Precision Measurements in Virtual Reality Headsets 虚拟现实头戴式耳机中注视跟踪器的精度和精度测量
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418816
J. Kangas, Olli Koskinen, R. Raisamo
To effectively utilize a gaze tracker in user interaction it is important to know the quality of the gaze data that it is measuring. We have developed a method to evaluate the accuracy and precision of gaze trackers in virtual reality headsets. The method consists of two software components. The first component is a simulation software that calibrates the gaze tracker and then performs data collection by providing a gaze target that moves around the headset's field-of-view. The second component makes an off-line analysis of the logged gaze data and provides a number of measurement results of the accuracy and precision. The analysis results consist of the accuracy and precision of the gaze tracker in different directions inside the virtual 3D space. Our method combines the measurements into overall accuracy and precision. Visualizations of the measurements are created to see possible trends over the display area. Results from selected areas in the display are analyzed to find out differences between the areas (for example, the middle/outer edge of the display or the upper/lower part of display).
为了在用户交互中有效地利用凝视跟踪器,了解它所测量的凝视数据的质量是很重要的。我们开发了一种方法来评估虚拟现实头戴式设备中注视跟踪器的准确性和精密度。该方法由两个软件组件组成。第一个组件是一个模拟软件,用于校准凝视跟踪器,然后通过提供一个在头戴式耳机的视野中移动的凝视目标来执行数据收集。第二个组件对记录的注视数据进行离线分析,并提供一些准确度和精度的测量结果。分析结果包括注视跟踪器在虚拟三维空间内不同方向的精度和精度。我们的方法将测量结果综合到整体准确度和精密度上。创建测量的可视化,以查看显示区域上可能的趋势。分析显示中选定区域的结果,找出区域之间的差异(例如,显示的中/外边缘或显示的上/下部分)。
{"title":"Gaze Tracker Accuracy and Precision Measurements in Virtual Reality Headsets","authors":"J. Kangas, Olli Koskinen, R. Raisamo","doi":"10.1145/3382507.3418816","DOIUrl":"https://doi.org/10.1145/3382507.3418816","url":null,"abstract":"To effectively utilize a gaze tracker in user interaction it is important to know the quality of the gaze data that it is measuring. We have developed a method to evaluate the accuracy and precision of gaze trackers in virtual reality headsets. The method consists of two software components. The first component is a simulation software that calibrates the gaze tracker and then performs data collection by providing a gaze target that moves around the headset's field-of-view. The second component makes an off-line analysis of the logged gaze data and provides a number of measurement results of the accuracy and precision. The analysis results consist of the accuracy and precision of the gaze tracker in different directions inside the virtual 3D space. Our method combines the measurements into overall accuracy and precision. Visualizations of the measurements are created to see possible trends over the display area. Results from selected areas in the display are analyzed to find out differences between the areas (for example, the middle/outer edge of the display or the upper/lower part of display).","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124315493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Punchline Detection using Context-Aware Hierarchical Multimodal Fusion 使用上下文感知分层多模态融合的笑点检测
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418891
Akshat Choube, M. Soleymani
Humor has a history as old as humanity. Humor often induces laughter and elicits amusement and engagement. Humorous behavior involves behavior manifested in different modalities including language, voice tone, and gestures. Thus, automatic understanding of humorous behavior requires multimodal behavior analysis. Humor detection is a well-established problem in Natural Language Processing but its multimodal analysis is less explored. In this paper, we present a context-aware hierarchical fusion network for multimodal punchline detection. The proposed neural architecture first fuses the modalities two by two and then fuses all three modalities. The network also models the context of the punchline using Gated Recurrent Unit(s). The model's performance is evaluated on UR-FUNNY database yielding state-of-the-art performance.
幽默的历史和人类一样古老。幽默常常引起笑声,使人感到愉快和愉快。幽默行为包括不同形式的行为,包括语言、语调和手势。因此,对幽默行为的自动理解需要多模态行为分析。幽默检测是自然语言处理中一个公认的问题,但其多模态分析研究较少。在本文中,我们提出了一个上下文感知的分层融合网络,用于多模态笑点检测。提出的神经结构首先对两个模态进行融合,然后对所有三个模态进行融合。该网络还使用门控循环单元对笑点的上下文进行建模。该模型的性能在UR-FUNNY数据库上进行评估,产生最先进的性能。
{"title":"Punchline Detection using Context-Aware Hierarchical Multimodal Fusion","authors":"Akshat Choube, M. Soleymani","doi":"10.1145/3382507.3418891","DOIUrl":"https://doi.org/10.1145/3382507.3418891","url":null,"abstract":"Humor has a history as old as humanity. Humor often induces laughter and elicits amusement and engagement. Humorous behavior involves behavior manifested in different modalities including language, voice tone, and gestures. Thus, automatic understanding of humorous behavior requires multimodal behavior analysis. Humor detection is a well-established problem in Natural Language Processing but its multimodal analysis is less explored. In this paper, we present a context-aware hierarchical fusion network for multimodal punchline detection. The proposed neural architecture first fuses the modalities two by two and then fuses all three modalities. The network also models the context of the punchline using Gated Recurrent Unit(s). The model's performance is evaluated on UR-FUNNY database yielding state-of-the-art performance.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116166926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Analyzing Nonverbal Behaviors along with Praising 在赞美的同时分析非语言行为
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418868
Toshiki Onishi, Arisa Yamauchi, Ryo Ishii, Y. Aono, Akihiro Miyata
In this work, as a first attempt to analyze the relationship between praising skills and human behavior in dialogue, we focus on head and face behavior. We create a new dialogue corpus including face and head behavior information of persons who give praise (praiser) and receive praise (receiver) and the degree of success of praising (praising score). We also create a machine learning model that uses features related to head and face behavior to estimate praising score, clarify which features of the praiser and receiver are important in estimating praising score. The analysis results showed that features of the praiser and receiver are important in estimating praising score and that features related to utterance, head, gaze, and chin were important. The analysis of the features of high importance revealed that the praiser and receiver should face each other without turning their heads to the left or right, and the longer the praiser's utterance, the more successful the praising.
在这项工作中,作为第一次尝试分析赞美技巧与对话中人类行为之间的关系,我们将重点放在头部和面部行为上。我们创建了一个新的对话语料库,包括表扬者(赞美者)和接受者(接受者)的面部和头部行为信息以及表扬的成功程度(表扬分数)。我们还创建了一个机器学习模型,该模型使用与头部和面部行为相关的特征来估计表扬分数,阐明了表扬者和接受者的哪些特征在估计表扬分数时是重要的。分析结果表明,赞美者和接受者的特征对评价分数有重要影响,言语、头部、目光和下巴等特征对评价分数有重要影响。对高重要性特征的分析表明,赞美者和接受者应该面对对方,不要把头转向左右,赞美者的话语越长,赞美越成功。
{"title":"Analyzing Nonverbal Behaviors along with Praising","authors":"Toshiki Onishi, Arisa Yamauchi, Ryo Ishii, Y. Aono, Akihiro Miyata","doi":"10.1145/3382507.3418868","DOIUrl":"https://doi.org/10.1145/3382507.3418868","url":null,"abstract":"In this work, as a first attempt to analyze the relationship between praising skills and human behavior in dialogue, we focus on head and face behavior. We create a new dialogue corpus including face and head behavior information of persons who give praise (praiser) and receive praise (receiver) and the degree of success of praising (praising score). We also create a machine learning model that uses features related to head and face behavior to estimate praising score, clarify which features of the praiser and receiver are important in estimating praising score. The analysis results showed that features of the praiser and receiver are important in estimating praising score and that features related to utterance, head, gaze, and chin were important. The analysis of the features of high importance revealed that the praiser and receiver should face each other without turning their heads to the left or right, and the longer the praiser's utterance, the more successful the praising.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126012765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Zero-Shot Learning for Gesture Recognition 零射击学习手势识别
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421161
Naveen Madapana
Zero-Shot Learning (ZSL) is a new paradigm in machine learning that aims to recognize the classes that are not present in the training data. Hence, this paradigm is capable of comprehending the categories that were never seen before. While deep learning has pushed the limits of unseen object recognition, ZSL for temporal problems such as unfamiliar gesture recognition (referred to as ZSGL) remain unexplored. ZSGL has the potential to result in efficient human-machine interfaces that can recognize and understand the spontaneous and conversational gestures of humans. In this regard, the objective of this work is to conceptualize, model and develop a framework to tackle ZSGL problems. The first step in the pipeline is to develop a database of gesture attributes that are representative of a range of categories. Next, a deep architecture consisting of convolutional and recurrent layers is proposed to jointly optimize the semantic and classification losses. Lastly, rigorous experiments are performed to compare the proposed model with respect to existing ZSL models on CGD 2013 and MSRC-12 datasets. In our preliminary work, we identified a list of 64 discriminative attributes related to gestures' morphological characteristics. Our approach yields an unseen class accuracy of (41%) which outperforms the state-of-the-art approaches by a considerable margin. Future work involves the following: 1. Modifying the existing architecture in order to improve the ZSL accuracy, 2. Augmenting the database of attributes to incorporate semantic properties, 3. Addressing the issue of data imbalance which is inherent to ZSL problems, and 4. Expanding this research to other domains such as surgeme and action recognition.
零射击学习(Zero-Shot Learning, ZSL)是机器学习的一种新范式,旨在识别训练数据中不存在的类。因此,这种范式能够理解以前从未见过的类别。虽然深度学习已经突破了看不见的物体识别的极限,但ZSL用于不熟悉的手势识别(简称ZSGL)等时间问题仍未被探索。ZSGL有可能产生高效的人机界面,可以识别和理解人类的自发和会话手势。在这方面,这项工作的目标是概念化、建模和开发一个框架来解决ZSGL问题。该流程的第一步是开发一个代表一系列类别的手势属性数据库。其次,提出了一种由卷积层和循环层组成的深度体系结构,共同优化语义和分类损失。最后,在CGD 2013和MSRC-12数据集上进行了严格的实验,将所提出的模型与现有的ZSL模型进行了比较。在我们的初步工作中,我们确定了64个与手势形态特征相关的判别属性。我们的方法产生了一个前所未见的类准确率(41%),这比最先进的方法要好得多。今后的工作包括:1 .改进现有结构以提高ZSL精度;2 .扩充属性数据库以纳入语义属性;解决数据不平衡的问题,这是ZSL问题固有的;将此研究扩展到其他领域,如涌浪和动作识别。
{"title":"Zero-Shot Learning for Gesture Recognition","authors":"Naveen Madapana","doi":"10.1145/3382507.3421161","DOIUrl":"https://doi.org/10.1145/3382507.3421161","url":null,"abstract":"Zero-Shot Learning (ZSL) is a new paradigm in machine learning that aims to recognize the classes that are not present in the training data. Hence, this paradigm is capable of comprehending the categories that were never seen before. While deep learning has pushed the limits of unseen object recognition, ZSL for temporal problems such as unfamiliar gesture recognition (referred to as ZSGL) remain unexplored. ZSGL has the potential to result in efficient human-machine interfaces that can recognize and understand the spontaneous and conversational gestures of humans. In this regard, the objective of this work is to conceptualize, model and develop a framework to tackle ZSGL problems. The first step in the pipeline is to develop a database of gesture attributes that are representative of a range of categories. Next, a deep architecture consisting of convolutional and recurrent layers is proposed to jointly optimize the semantic and classification losses. Lastly, rigorous experiments are performed to compare the proposed model with respect to existing ZSL models on CGD 2013 and MSRC-12 datasets. In our preliminary work, we identified a list of 64 discriminative attributes related to gestures' morphological characteristics. Our approach yields an unseen class accuracy of (41%) which outperforms the state-of-the-art approaches by a considerable margin. Future work involves the following: 1. Modifying the existing architecture in order to improve the ZSL accuracy, 2. Augmenting the database of attributes to incorporate semantic properties, 3. Addressing the issue of data imbalance which is inherent to ZSL problems, and 4. Expanding this research to other domains such as surgeme and action recognition.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121710632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Facilitating Flexible Force Feedback Design with Feelix 利用Feelix促进灵活的力反馈设计
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418819
Anke van Oosterhout, M. Bruns, Eve E. Hoggan
In the last decade, haptic actuators have improved in quality and efficiency, enabling easier implementation in user interfaces. One of the next steps towards a mature haptics field is a larger and more diverse toolset that enables designers and novices to explore with the design and implementation of haptic feedback in their projects. In this paper, we look at several design projects that utilize haptic force feedback to aid interaction between the user and product. We analysed the process interaction designers went through when developing their haptic user interfaces. Based on our insights, we identified requirements for a haptic force feedback authoring tool. We discuss how these requirements are addressed by 'Feelix', a tool that supports sketching and refinement of haptic force feedback effects.
在过去的十年中,触觉执行器在质量和效率方面都有了提高,使得在用户界面中更容易实现。迈向成熟触觉领域的下一步是一个更大、更多样化的工具集,使设计师和新手能够在他们的项目中探索触觉反馈的设计和实现。在本文中,我们着眼于几个设计项目,利用触觉力反馈来帮助用户和产品之间的交互。我们分析了交互设计师在开发触觉用户界面时所经历的过程。基于我们的见解,我们确定了触觉力反馈创作工具的需求。我们讨论如何通过“Feelix”解决这些要求,这是一个支持草图和触觉力反馈效果改进的工具。
{"title":"Facilitating Flexible Force Feedback Design with Feelix","authors":"Anke van Oosterhout, M. Bruns, Eve E. Hoggan","doi":"10.1145/3382507.3418819","DOIUrl":"https://doi.org/10.1145/3382507.3418819","url":null,"abstract":"In the last decade, haptic actuators have improved in quality and efficiency, enabling easier implementation in user interfaces. One of the next steps towards a mature haptics field is a larger and more diverse toolset that enables designers and novices to explore with the design and implementation of haptic feedback in their projects. In this paper, we look at several design projects that utilize haptic force feedback to aid interaction between the user and product. We analysed the process interaction designers went through when developing their haptic user interfaces. Based on our insights, we identified requirements for a haptic force feedback authoring tool. We discuss how these requirements are addressed by 'Feelix', a tool that supports sketching and refinement of haptic force feedback effects.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127657497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Speaker-Invariant Adversarial Domain Adaptation for Emotion Recognition 情绪识别的说话人不变对抗域自适应
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418813
Yufeng Yin, Baiyu Huang, Yizhen Wu, M. Soleymani
Automatic emotion recognition methods are sensitive to the variations across different datasets and their performance drops when evaluated across corpora. We can apply domain adaptation techniques e.g., Domain-Adversarial Neural Network (DANN) to mitigate this problem. Though the DANN can detect and remove the bias between corpora, the bias between speakers still remains which results in reduced performance. In this paper, we propose Speaker-Invariant Domain-Adversarial Neural Network (SIDANN) to reduce both the domain bias and the speaker bias. Specifically, based on the DANN, we add a speaker discriminator to unlearn information representing speakers' individual characteristics with a gradient reversal layer (GRL). Our experiments with multimodal data (speech, vision, and text) and the cross-domain evaluation indicate that the proposed SIDANN outperforms (+5.6% and +2.8% on average for detecting arousal and valence) the DANN model, suggesting that the SIDANN has a better domain adaptation ability than the DANN. Besides, the modality contribution analysis shows that the acoustic features are the most informative for arousal detection while the lexical features perform the best for valence detection.
自动情绪识别方法对不同数据集之间的差异很敏感,在跨语料库评估时,其性能会下降。我们可以应用领域自适应技术,如领域对抗神经网络(DANN)来缓解这个问题。虽然DANN可以检测和消除语料库之间的偏见,但说话者之间的偏见仍然存在,导致性能下降。在本文中,我们提出了演讲者-不变域-对抗神经网络(SIDANN)来减少域偏差和说话人偏差。具体而言,我们在DANN的基础上,通过梯度反转层(GRL)添加说话人鉴别器来去除代表说话人个体特征的信息。我们对多模态数据(语音、视觉和文本)的实验和跨域评估表明,所提出的SIDANN模型优于DANN模型(觉醒和价态检测平均+5.6%和+2.8%),表明SIDANN具有比DANN更好的域适应能力。此外,模态贡献分析表明,声学特征对唤醒检测的信息量最大,而词汇特征对价态检测的信息量最大。
{"title":"Speaker-Invariant Adversarial Domain Adaptation for Emotion Recognition","authors":"Yufeng Yin, Baiyu Huang, Yizhen Wu, M. Soleymani","doi":"10.1145/3382507.3418813","DOIUrl":"https://doi.org/10.1145/3382507.3418813","url":null,"abstract":"Automatic emotion recognition methods are sensitive to the variations across different datasets and their performance drops when evaluated across corpora. We can apply domain adaptation techniques e.g., Domain-Adversarial Neural Network (DANN) to mitigate this problem. Though the DANN can detect and remove the bias between corpora, the bias between speakers still remains which results in reduced performance. In this paper, we propose Speaker-Invariant Domain-Adversarial Neural Network (SIDANN) to reduce both the domain bias and the speaker bias. Specifically, based on the DANN, we add a speaker discriminator to unlearn information representing speakers' individual characteristics with a gradient reversal layer (GRL). Our experiments with multimodal data (speech, vision, and text) and the cross-domain evaluation indicate that the proposed SIDANN outperforms (+5.6% and +2.8% on average for detecting arousal and valence) the DANN model, suggesting that the SIDANN has a better domain adaptation ability than the DANN. Besides, the modality contribution analysis shows that the acoustic features are the most informative for arousal detection while the lexical features perform the best for valence detection.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132396449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
LieCatcher: Game Framework for Collecting Human Judgments of Deceptive Speech LieCatcher:收集人类对欺骗性言语判断的游戏框架
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421166
Sarah Ita Levitan, James Shin, Ivy Chen, Julia Hirschberg
Humans are notoriously poor at detecting deception --- most are worse than chance. To address this issue we have developed LieCatcher, a single-player web-based Game With A Purpose (GWAP) that allows players to assess their lie detection skills while providing human judgments of deceptive speech. Players listen to audio recordings drawn from a corpus of deceptive and non-deceptive interview dialogues, and guess if the speaker is lying or telling the truth. They are awarded points for correct guesses and at the end of the game they receive a score summarizing their performance at lie detection. We present the game design and implementation, and describe a crowdsourcing experiment conducted to study perceived deception.
人类在识别欺骗方面的能力是出了名的差——大多数都比运气差。为了解决这个问题,我们开发了一款基于网页的单人游戏《寻谎者》(LieCatcher),它允许玩家评估自己的测谎技能,同时提供人类对欺骗性言论的判断。参与者听一段取自欺骗性和非欺骗性访谈对话的录音,然后猜测说话者是在撒谎还是在说实话。他们会因猜对而获得分数,在游戏结束时,他们会得到一个分数,总结他们在测谎中的表现。我们介绍了游戏的设计和实现,并描述了一个用于研究感知欺骗的众包实验。
{"title":"LieCatcher: Game Framework for Collecting Human Judgments of Deceptive Speech","authors":"Sarah Ita Levitan, James Shin, Ivy Chen, Julia Hirschberg","doi":"10.1145/3382507.3421166","DOIUrl":"https://doi.org/10.1145/3382507.3421166","url":null,"abstract":"Humans are notoriously poor at detecting deception --- most are worse than chance. To address this issue we have developed LieCatcher, a single-player web-based Game With A Purpose (GWAP) that allows players to assess their lie detection skills while providing human judgments of deceptive speech. Players listen to audio recordings drawn from a corpus of deceptive and non-deceptive interview dialogues, and guess if the speaker is lying or telling the truth. They are awarded points for correct guesses and at the end of the game they receive a score summarizing their performance at lie detection. We present the game design and implementation, and describe a crowdsourcing experiment conducted to study perceived deception.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129977530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Group-level Speech Emotion Recognition Utilising Deep Spectrum Features 基于深度频谱特征的群体级语音情感识别
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417964
Sandra Ottl, S. Amiriparian, Maurice Gerczuk, Vincent Karas, Björn Schuller
The objectives of this challenge paper are two fold: first, we apply a range of neural network based transfer learning approaches to cope with the data scarcity in the field of speech emotion recognition, and second, we fuse the obtained representations and predictions in a nearly and late fusion strategy to check the complementarity of the applied networks. In particular, we use our Deep Spectrum system to extract deep feature representations from the audio content of the 2020 EmotiW group level emotion prediction challenge data. We evaluate a total of ten ImageNet pre-trained Convolutional Neural Networks, including AlexNet, VGG16, VGG19 and three DenseNet variants as audio feature extractors. We compare their performance to the ComParE feature set used in the challenge baseline, employing simple logistic regression models trained with Stochastic Gradient Descent as classifiers. With the help of late fusion, our approach improves the performance on the test set from 47.88 % to 62.70 % accuracy.
本文的目标有两个方面:首先,我们应用一系列基于神经网络的迁移学习方法来应对语音情感识别领域的数据稀缺性;其次,我们将获得的表示和预测融合在一个近后期融合策略中,以检查应用网络的互补性。特别是,我们使用我们的深度频谱系统从2020年EmotiW组级情绪预测挑战数据的音频内容中提取深度特征表示。我们总共评估了10个ImageNet预训练的卷积神经网络,包括AlexNet、VGG16、VGG19和三个DenseNet变体作为音频特征提取器。我们将它们的性能与挑战基线中使用的compare特征集进行比较,使用随机梯度下降训练的简单逻辑回归模型作为分类器。在后期融合的帮助下,我们的方法将测试集的准确率从47.88%提高到62.70%。
{"title":"Group-level Speech Emotion Recognition Utilising Deep Spectrum Features","authors":"Sandra Ottl, S. Amiriparian, Maurice Gerczuk, Vincent Karas, Björn Schuller","doi":"10.1145/3382507.3417964","DOIUrl":"https://doi.org/10.1145/3382507.3417964","url":null,"abstract":"The objectives of this challenge paper are two fold: first, we apply a range of neural network based transfer learning approaches to cope with the data scarcity in the field of speech emotion recognition, and second, we fuse the obtained representations and predictions in a nearly and late fusion strategy to check the complementarity of the applied networks. In particular, we use our Deep Spectrum system to extract deep feature representations from the audio content of the 2020 EmotiW group level emotion prediction challenge data. We evaluate a total of ten ImageNet pre-trained Convolutional Neural Networks, including AlexNet, VGG16, VGG19 and three DenseNet variants as audio feature extractors. We compare their performance to the ComParE feature set used in the challenge baseline, employing simple logistic regression models trained with Stochastic Gradient Descent as classifiers. With the help of late fusion, our approach improves the performance on the test set from 47.88 % to 62.70 % accuracy.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131692590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Understanding Applicants' Reactions to Asynchronous Video Interviews Through Self-reports and Nonverbal Cues 通过自我报告和非语言线索了解应聘者对非同步视频面试的反应
Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418869
Skanda Muralidhar, E. Kleinlogel, E. Mayor, Adrian Bangerter, M. S. Mast, D. Gática-Pérez
Asynchronous video interviews (AVIs) are increasingly used by organizations in their hiring process. In this mode of interviewing, the applicants are asked to record their responses to predefined interview questions using a webcam via an online platform. AVIs have increased usage due to employers' perceived benefits in terms of costs and scale. However, little research has been conducted regarding applicants' reactions to these new interview methods. In this work, we investigate applicants' reactions to an AVI platform using self-reported measures previously validated in psychology literature. We also investigate the connections of these measures with nonverbal behavior displayed during the interviews. We find that participants who found the platform creepy and had concerns about privacy reported lower interview performance compared to participants who did not have such concerns. We also observe weak correlations between nonverbal cues displayed and these self-reported measures. Finally, inference experiments achieve overall low-performance w.r.t. to explaining applicants' reactions. Overall, our results reveal that participants who are not at ease with AVIs (i.e., high creepy ambiguity score) might be unfairly penalized. This has implications for improved hiring practices using AVIs.
异步视频面试(AVIs)越来越多地被组织在招聘过程中使用。在这种面试模式中,申请人被要求通过在线平台使用网络摄像头记录他们对预定面试问题的回答。由于雇主在成本和规模方面感受到的好处,AVIs的使用率增加了。然而,很少有关于应聘者对这些新的面试方法的反应的研究。在这项工作中,我们调查了申请人对AVI平台的反应,使用了先前在心理学文献中得到验证的自我报告措施。我们还调查了这些措施与非语言行为在访谈中显示的联系。我们发现,与没有这种担忧的参与者相比,那些觉得这个平台令人毛骨悚然、担心隐私的参与者在面试中表现较差。我们还观察到非语言提示和这些自我报告的测量之间的弱相关性。最后,推理实验在解释申请人的反应方面达到了整体低绩效的w.r.t.。总的来说,我们的研究结果表明,那些对AVIs不放心的参与者(即,高令人毛骨悚然的模糊得分)可能会受到不公平的惩罚。这对使用AVIs改善招聘实践具有启示意义。
{"title":"Understanding Applicants' Reactions to Asynchronous Video Interviews Through Self-reports and Nonverbal Cues","authors":"Skanda Muralidhar, E. Kleinlogel, E. Mayor, Adrian Bangerter, M. S. Mast, D. Gática-Pérez","doi":"10.1145/3382507.3418869","DOIUrl":"https://doi.org/10.1145/3382507.3418869","url":null,"abstract":"Asynchronous video interviews (AVIs) are increasingly used by organizations in their hiring process. In this mode of interviewing, the applicants are asked to record their responses to predefined interview questions using a webcam via an online platform. AVIs have increased usage due to employers' perceived benefits in terms of costs and scale. However, little research has been conducted regarding applicants' reactions to these new interview methods. In this work, we investigate applicants' reactions to an AVI platform using self-reported measures previously validated in psychology literature. We also investigate the connections of these measures with nonverbal behavior displayed during the interviews. We find that participants who found the platform creepy and had concerns about privacy reported lower interview performance compared to participants who did not have such concerns. We also observe weak correlations between nonverbal cues displayed and these self-reported measures. Finally, inference experiments achieve overall low-performance w.r.t. to explaining applicants' reactions. Overall, our results reveal that participants who are not at ease with AVIs (i.e., high creepy ambiguity score) might be unfairly penalized. This has implications for improved hiring practices using AVIs.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131770988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1