Chris Xiaoxuan Lu, Hongkai Wen, Sen Wang, A. Markham, A. Trigoni
{"title":"扫描:从噪声传感器数据中学习说话人身份","authors":"Chris Xiaoxuan Lu, Hongkai Wen, Sen Wang, A. Markham, A. Trigoni","doi":"10.1145/3055031.3055073","DOIUrl":null,"url":null,"abstract":"Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more efficient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a specific identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identification precision than approaches that execute these steps sequentially. We demonstrate the performance benefits of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews.","PeriodicalId":228318,"journal":{"name":"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"SCAN: Learning Speaker Identity from Noisy Sensor Data\",\"authors\":\"Chris Xiaoxuan Lu, Hongkai Wen, Sen Wang, A. Markham, A. Trigoni\",\"doi\":\"10.1145/3055031.3055073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more efficient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a specific identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identification precision than approaches that execute these steps sequentially. We demonstrate the performance benefits of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews.\",\"PeriodicalId\":228318,\"journal\":{\"name\":\"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)\",\"volume\":\"225 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3055031.3055073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055031.3055073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SCAN: Learning Speaker Identity from Noisy Sensor Data
Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more efficient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a specific identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identification precision than approaches that execute these steps sequentially. We demonstrate the performance benefits of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews.