SCAN: Learning Speaker Identity from Noisy Sensor Data

Chris Xiaoxuan Lu, Hongkai Wen, Sen Wang, A. Markham, A. Trigoni
{"title":"SCAN: Learning Speaker Identity from Noisy Sensor Data","authors":"Chris Xiaoxuan Lu, Hongkai Wen, Sen Wang, A. Markham, A. Trigoni","doi":"10.1145/3055031.3055073","DOIUrl":null,"url":null,"abstract":"Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more efficient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a specific identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identification precision than approaches that execute these steps sequentially. We demonstrate the performance benefits of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews.","PeriodicalId":228318,"journal":{"name":"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055031.3055073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more efficient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a specific identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identification precision than approaches that execute these steps sequentially. We demonstrate the performance benefits of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
扫描:从噪声传感器数据中学习说话人身份
从多个传感器同时获取的传感器数据在我们日益普及的世界中越来越具有特色。建筑可以变得更智能、更高效,空间对用户的反应更灵敏。智能空间的一个基本组成部分是能够了解特定区域中存在的人。一种普遍的检测方法是利用人们相互交流时独特的声音特征。例如,考虑在会议期间采样的音频特征,产生一组嘈杂的可能声纹。有了大量的会议和参与的知识(例如,通过日历或MAC地址),我们能学会将特定的身份与特定的声纹联系起来吗?显然,将用户登记到生物特征数据库中是非常耗时的,而且随着时间的推移,对声音的偏差也不太可靠。为了解决这个问题,标准方法是在身份丰富的传感器数据可用时,执行一个聚类步骤(例如音频数据),然后执行一个数据关联步骤。在本文中,我们证明了这种方法对两种类型的传感器流中的噪声都没有鲁棒性;为了解决这个问题,我们提出了一种新的算法,该算法联合优化聚类和关联过程,产生比顺序执行这些步骤的方法高三倍的识别精度。我们在两个案例研究中展示了我们方法的性能优势,一个案例研究使用了我们从非住宅建筑的会议中收集的声学和MAC数据集,另一个案例研究使用了从广播采访记录中收集的在线数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks SurfaceVibe: Vibration-Based Tap & Swipe Tracking on Ubiquitous Surfaces 3D Through-Wall Imaging with Unmanned Aerial Vehicles Using WiFi MinHash Hierarchy for Privacy Preserving Trajectory Sensing and Query VideoMec: A Metadata-Enhanced Crowdsourcing System for Mobile Videos
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1