DNN-based Embeddings for Speaker Diarization in the AuDIaS-UAM System for the Albayzin 2018 IberSPEECH-RTVE Evaluation

IberSPEECH Conference Pub Date : 2018-11-21 DOI:10.21437/IBERSPEECH.2018-46

Alicia Lozano-Diez, Beltran Labrador, Diego de Benito-Gorrón, Pablo Ramirez, D. Toledano

引用次数: 3

Abstract

This document describes the three systems submitted by the AuDIaS-UAM team for the Albayzin 2018 IberSPEECH-RTVE speaker diarization evaluation. Two of our systems (primary and contrastive 1 submissions) are based on embeddings which are a ﬁxed length representation of a given audio segment obtained from a deep neural network (DNN) trained for speaker classiﬁcation. The third system (contrastive 2) uses the classical i-vector as representation of the audio segments. The resulting embeddings or i-vectors are then grouped using Agglomerative Hierarchical Clustering (AHC) in order to obtain the diarization labels. The new DNN-embedding approach for speaker diarization has obtained a remarkable performance over the Albayzin development dataset, similar to the performance achieved with the well-known i-vector approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Albayzin 2018 IberSPEECH-RTVE评估中AuDIaS-UAM系统中基于dnn的扬声器化嵌入

本文档描述了AuDIaS-UAM团队为Albayzin 2018 IberSPEECH-RTVE扬声器化评估提交的三个系统。我们的两个系统(主要和对比1提交)基于嵌入，嵌入是给定音频片段的固定长度表示，这些音频片段来自用于说话人分类的深度神经网络(DNN)。第三个系统(对比2)使用经典的i向量作为音频片段的表示。然后使用聚类分层聚类(AHC)对产生的嵌入或i向量进行分组，以获得diarization标签。新的深度神经网络嵌入方法在Albayzin发展数据集上获得了显着的性能，类似于众所周知的i向量方法所取得的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IberSPEECH Conference

自引率

0.00%

发文量