A Speaker Count System for Telephone Conversations

Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski
{"title":"A Speaker Count System for Telephone Conversations","authors":"Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364899","DOIUrl":null,"url":null,"abstract":"In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Symposium on Intelligent Signal Processing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2006.364899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于电话交谈的说话人计数系统
在电话交谈中,只能检查每个说话者的简短连续话语,因此,在这种谈话中区分说话者是一项具有挑战性的任务,当没有关于说话者的先验信息时,这种任务变得更加具有挑战性。本文提出了一种确定电话会话中说话人数量的方法。这种方法假定不了解或不了解任何参与演讲的人。该技术的基础是比较对话中的短话语,并确定它们是否属于同一说话者。本研究的应用包括三方呼叫检测和说话人跟踪,并可扩展到说话人变化点检测和索引。所提出的方法包括一个消除过程,其中匹配一组选定的参考模型的语音片段依次从对话中删除。利用对话中浊音段线性预测倒谱系数的均值向量和协方差矩阵形成模型。利用马氏距离来确定两个模型是否属于相同或不同的说话者,基于似然比测试,进行了研究。在每次消除处理后观察残余语音的相对量,以确定是否存在额外的说话者。实验在来自HTIMIT数据库的4000个人工会话上进行。该系统的平均说话人计数准确率为78%
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lossy Strict Multilevel Successive Elimination Algorithm for Fast Motion Estimation A Subpixel Image Matching Technique Using Phase-Only Correlation Phase Unwrapping of Self-mixing Signals Observed in Optical Feedback Interferometry for Displacement Measurement A Low-Power and Low-Noise Amplifier for 3-5GHz UWB Applications Automatic Image Annotation based-on Rough Set Theory with Visual Keys
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1