{"title":"Audio Retrieval By Voice Imitation","authors":"Samah Khawaled, Mohamad Khateeb, Hadas Benisty","doi":"10.1109/ICSEE.2018.8646294","DOIUrl":null,"url":null,"abstract":"Existing sound retrieval systems are mostly based on a textual query. Using text to describe a sound signal is not intuitive and is often inaccurate due to subjective impression of the user; different people may use different words to describe the same sound which makes theses system complex to design and unintuitive to use. Vocal imitation, however, is the most natural human way to describe a sound. In this paper we consider a newly rising approach for sound retrieval based on vocal imitations, where the user records himself imitating the desired sound, and the system retrieves a ranked list of the most similar sounds in the dataset. In this work we represent sound signals using histograms, obtained with respect to a Gaussian Mixture Model (GMM), representing the spectral domain. This recently proposed approach was successfully applied for word representation in a keyword spotting task. Having a fixed length representation for vocal imitation signals allows us to train a robust classifier using support vector machine (SVM). Given a test imitation signal, we apply the classifier and use the output score to rank the retrieved signals, based on a majority vote. Our simulation results show that the proposed system yields a more accurate ranking compared with other existing solutions.","PeriodicalId":254455,"journal":{"name":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSEE.2018.8646294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing sound retrieval systems are mostly based on a textual query. Using text to describe a sound signal is not intuitive and is often inaccurate due to subjective impression of the user; different people may use different words to describe the same sound which makes theses system complex to design and unintuitive to use. Vocal imitation, however, is the most natural human way to describe a sound. In this paper we consider a newly rising approach for sound retrieval based on vocal imitations, where the user records himself imitating the desired sound, and the system retrieves a ranked list of the most similar sounds in the dataset. In this work we represent sound signals using histograms, obtained with respect to a Gaussian Mixture Model (GMM), representing the spectral domain. This recently proposed approach was successfully applied for word representation in a keyword spotting task. Having a fixed length representation for vocal imitation signals allows us to train a robust classifier using support vector machine (SVM). Given a test imitation signal, we apply the classifier and use the output score to rank the retrieved signals, based on a majority vote. Our simulation results show that the proposed system yields a more accurate ranking compared with other existing solutions.