Learning Discriminative Joint Embeddings for Efficient Face and Voice Association

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval Pub Date : 2020-07-25 DOI:10.1145/3397271.3401302

Rui Wang, Xin Liu, Y. Cheung, Kai Cheng, Nannan Wang, Wentao Fan

{"title":"Learning Discriminative Joint Embeddings for Efficient Face and Voice Association","authors":"Rui Wang, Xin Liu, Y. Cheung, Kai Cheng, Nannan Wang, Wentao Fan","doi":"10.1145/3397271.3401302","DOIUrl":null,"url":null,"abstract":"Many cognitive researches have shown the natural possibility of face-voice association, and such potential association has attracted much attention in biometric cross-modal retrieval domain. Nevertheless, the existing methods often fail to explicitly learn the common embeddings for challenging face-voice association tasks. In this paper, we present to learn discriminative joint embedding for face-voice association, which can seamlessly train the face subnetwork and voice subnetwork to learn their high-level semantic features, while correlating them to be compared directly and efficiently. Within the proposed approach, we introduce bi-directional ranking constraint, identity constraint and center constraint to learn the joint face-voice embedding, and adopt bi-directional training strategy to train the deep correlated face-voice model. Meanwhile, an online hard negative mining technique is utilized to discriminatively construct hard triplets in a mini-batch manner, featuring on speeding up the learning process. Accordingly, the proposed approach is adaptive to benefit various face-voice association tasks, including cross-modal verification, 1:2 matching, 1:N matching, and retrieval scenarios. Extensive experiments have shown its improved performances in comparison with the state-of-the-art ones.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397271.3401302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Many cognitive researches have shown the natural possibility of face-voice association, and such potential association has attracted much attention in biometric cross-modal retrieval domain. Nevertheless, the existing methods often fail to explicitly learn the common embeddings for challenging face-voice association tasks. In this paper, we present to learn discriminative joint embedding for face-voice association, which can seamlessly train the face subnetwork and voice subnetwork to learn their high-level semantic features, while correlating them to be compared directly and efficiently. Within the proposed approach, we introduce bi-directional ranking constraint, identity constraint and center constraint to learn the joint face-voice embedding, and adopt bi-directional training strategy to train the deep correlated face-voice model. Meanwhile, an online hard negative mining technique is utilized to discriminatively construct hard triplets in a mini-batch manner, featuring on speeding up the learning process. Accordingly, the proposed approach is adaptive to benefit various face-voice association tasks, including cross-modal verification, 1:2 matching, 1:N matching, and retrieval scenarios. Extensive experiments have shown its improved performances in comparison with the state-of-the-art ones.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人脸与语音高效关联的学习判别联合嵌入

许多认知研究已经证明了人脸-语音关联的自然可能性，这种潜在关联在生物识别跨模态检索领域受到了广泛关注。然而，现有的方法往往不能明确地学习具有挑战性的人脸-语音关联任务的共同嵌入。本文提出了一种用于人脸-语音关联的学习判别联合嵌入方法，该方法可以无缝地训练人脸子网和语音子网学习它们的高级语义特征，并将它们关联起来进行直接有效的比较。在该方法中，我们引入双向排名约束、身份约束和中心约束来学习人脸-语音联合嵌入，并采用双向训练策略训练深度相关人脸-语音模型。同时，利用在线硬负挖掘技术，以小批量的方式判别构建硬三元组，加快了学习过程。因此，该方法适用于各种人脸语音关联任务，包括跨模态验证、1:2匹配、1:N匹配和检索场景。大量的实验表明，与最先进的产品相比，它的性能有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

自引率

0.00%

发文量

期刊最新文献

MHM: Multi-modal Clinical Data based Hierarchical Multi-label Diagnosis Prediction Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval DVGAN Models Versus Satisfaction: Towards a Better Understanding of Evaluation Metrics Global Context Enhanced Graph Neural Networks for Session-based Recommendation