{"title":"基于语音记录的移动电话聚类混合方法","authors":"Mou Wang, Xiao-Lei Zhang, S. Rahardja","doi":"10.1109/Ubi-Media.2019.00047","DOIUrl":null,"url":null,"abstract":"Acquisition device clustering based on speech recordings is a critical problem in the field of speech forensic, especially for mobile phone clustering (MPC). Previous studies on mobile phone recognition or clustering can be categorized ainly to two approaches. One approach utilizes handcraft features such as Mel-frequency cepstral coefficients (MFCCs), while the other uses learned features from neural networks. In this paper, we propose a hybrid system for MPC. Specifically, we first extract supervectors from MFCCs by a Gaussian mixture model and obtain the deep bottleneck features by a deep auto-encoder network. Then, we feed the two features to spectral clustering respectively, which outputs two low-dimensional vectors by the Laplacian eigen-decomposition of the spectral clustering. Finally, we fuse the two vectors and conduct clustering on the fused feature by k-means. The performance of the proposed method is evaluated on a public corpus—MOBIPHONE. The results show that the proposed method is effective, and moreover, the supervectors and deep bottleneck features provide complementary information of the intrinsic characteristics of the speech recordings recorded by the mobile phones.","PeriodicalId":259542,"journal":{"name":"2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Hybrid Approach for Mobile Phone Clustering with Speech Recordings\",\"authors\":\"Mou Wang, Xiao-Lei Zhang, S. Rahardja\",\"doi\":\"10.1109/Ubi-Media.2019.00047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Acquisition device clustering based on speech recordings is a critical problem in the field of speech forensic, especially for mobile phone clustering (MPC). Previous studies on mobile phone recognition or clustering can be categorized ainly to two approaches. One approach utilizes handcraft features such as Mel-frequency cepstral coefficients (MFCCs), while the other uses learned features from neural networks. In this paper, we propose a hybrid system for MPC. Specifically, we first extract supervectors from MFCCs by a Gaussian mixture model and obtain the deep bottleneck features by a deep auto-encoder network. Then, we feed the two features to spectral clustering respectively, which outputs two low-dimensional vectors by the Laplacian eigen-decomposition of the spectral clustering. Finally, we fuse the two vectors and conduct clustering on the fused feature by k-means. The performance of the proposed method is evaluated on a public corpus—MOBIPHONE. The results show that the proposed method is effective, and moreover, the supervectors and deep bottleneck features provide complementary information of the intrinsic characteristics of the speech recordings recorded by the mobile phones.\",\"PeriodicalId\":259542,\"journal\":{\"name\":\"2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/Ubi-Media.2019.00047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Ubi-Media.2019.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Hybrid Approach for Mobile Phone Clustering with Speech Recordings
Acquisition device clustering based on speech recordings is a critical problem in the field of speech forensic, especially for mobile phone clustering (MPC). Previous studies on mobile phone recognition or clustering can be categorized ainly to two approaches. One approach utilizes handcraft features such as Mel-frequency cepstral coefficients (MFCCs), while the other uses learned features from neural networks. In this paper, we propose a hybrid system for MPC. Specifically, we first extract supervectors from MFCCs by a Gaussian mixture model and obtain the deep bottleneck features by a deep auto-encoder network. Then, we feed the two features to spectral clustering respectively, which outputs two low-dimensional vectors by the Laplacian eigen-decomposition of the spectral clustering. Finally, we fuse the two vectors and conduct clustering on the fused feature by k-means. The performance of the proposed method is evaluated on a public corpus—MOBIPHONE. The results show that the proposed method is effective, and moreover, the supervectors and deep bottleneck features provide complementary information of the intrinsic characteristics of the speech recordings recorded by the mobile phones.