Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts

IF 0.1 Q4 MULTIDISCIPLINARY SCIENCES Tecnologia en Marcha Pub Date : 2022-11-16 DOI:10.18845/tm.v35i8.6464
Roberto Sánchez Cárdenas, Marvin Coto-Jiménez
{"title":"Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts","authors":"Roberto Sánchez Cárdenas, Marvin Coto-Jiménez","doi":"10.18845/tm.v35i8.6464","DOIUrl":null,"url":null,"abstract":"Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database of speech audio files and the corresponding text transcriptions. Among the several steps and tasks required for any of those applications, the speaker diarization is one of the most relevant, because it pretends to find boundaries in the audio recordings according to who speaks in each fragment. Speaker diarization can be performed in a supervised or unsupervised way and is commonly applied in audios consisting of pure speech. In this work, a first annotated dataset and analysis of speaker diarization for Costa Rican radio broadcasting is performed, using two approaches: a classic one based on k-means clustering, and the more recent Fischer Semi Discriminant. We chose publicly available radio broadcast and decided to compare those systems’ applicability in the complete audio files, which also contains some segments of music and challenging acoustic conditions. Results show a dependency on the results according to the number of speakers in each broadcast, especially in the average cluster purity. The results also show the necessity of further exploration and combining with other classification and segmentation algorithms to better extract useful information from the dataset and allow further development of speech corpus.","PeriodicalId":42957,"journal":{"name":"Tecnologia en Marcha","volume":"3 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tecnologia en Marcha","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18845/tm.v35i8.6464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database of speech audio files and the corresponding text transcriptions. Among the several steps and tasks required for any of those applications, the speaker diarization is one of the most relevant, because it pretends to find boundaries in the audio recordings according to who speaks in each fragment. Speaker diarization can be performed in a supervised or unsupervised way and is commonly applied in audios consisting of pure speech. In this work, a first annotated dataset and analysis of speaker diarization for Costa Rican radio broadcasting is performed, using two approaches: a classic one based on k-means clustering, and the more recent Fischer Semi Discriminant. We chose publicly available radio broadcast and decided to compare those systems’ applicability in the complete audio files, which also contains some segments of music and challenging acoustic conditions. Results show a dependency on the results according to the number of speakers in each broadcast, especially in the average cluster purity. The results also show the necessity of further exploration and combining with other classification and segmentation algorithms to better extract useful information from the dataset and allow further development of speech corpus.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fischer半判别分析在哥斯达黎加无线电广播中的应用
音频流的自动分割和分类是一个具有挑战性的问题,在多媒体数字图书馆的索引、信息检索、特定语言和口音的语音语料库(口语语料库)的建立等方面有着广泛的应用。这些语料库是一个语音音频文件和相应文本转录的数据库。在这些应用程序所需的几个步骤和任务中,说话者拨号是最相关的一个,因为它假装根据每个片段中说话的人在录音中找到边界。说话人特征化可以以有监督或无监督的方式进行,通常应用于由纯语音组成的音频。在这项工作中,使用两种方法对哥斯达黎加无线电广播的第一个注释数据集和说话人特征进行了分析:一种基于k-means聚类的经典方法,以及最近的Fischer半判别法。我们选择了公开的无线电广播,并决定比较这些系统在完整音频文件中的适用性,其中还包含一些音乐片段和具有挑战性的声学条件。结果显示,根据每次广播中的扬声器数量,特别是在平均簇纯度上,结果依赖于结果。结果还表明,需要进一步探索并结合其他分类和分割算法,以便更好地从数据集中提取有用信息,从而进一步开发语音语料库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Tecnologia en Marcha
Tecnologia en Marcha MULTIDISCIPLINARY SCIENCES-
自引率
0.00%
发文量
93
审稿时长
28 weeks
期刊最新文献
Gestión de Residuos en proyectos de construcción de viviendas en Costa Rica: teoría versus práctica Enseñanza del ordenamiento territorial como herramienta en la gestión de proyectos de obra pública Transferencia de conocimiento desde las universidades a las empresas Virtualidad en la enseñanza de investigación en la maestría en Gerencia de Proyectos del Tecnológico de Costa Rica Impacto de la metodología BIM en la gestión de proyectos de construcción
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1