{"title":"可扩展的环境声音分析","authors":"K. Biatov","doi":"10.1109/ICSPCS.2009.5306423","DOIUrl":null,"url":null,"abstract":"This paper describes a method for environmental audio events analysis. The audio events are modeled using a common universal codebook. The codebook is based on the bag-of-frames (BOF). The features corresponding to the frames and extracted from all audio files are grouped into clusters using the k-means algorithm. The individual audio file is modeled on the normalized distribution of the numbers of cluster bins corresponding to the frames of this file. Each audio file is described by one vector. The audio data are represented as feature-file matrix similar to term-document representation in Latent Semantic Indexing (LSI). The LSI is applied to the feature-file matrix to represent the data in latent semantic space. Then the primary file description is converted to the vectors of similarity to anchor reference data. For anchor reference the training data are used. Each component of this vector is a probabilistic similarity between target file and anchor reference file corresponding to the considered component. The LSI is applied once more to the new feature-file matrix, mapping the data to the latent semantic space in the anchor reference space. For audio recognition and audio retrieval the nearest-neighbor (NN) algorithm is exploited. The described data representation improves the results of audio retrieval and recognition.","PeriodicalId":356711,"journal":{"name":"2009 3rd International Conference on Signal Processing and Communication Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable environmental sounds analysis\",\"authors\":\"K. Biatov\",\"doi\":\"10.1109/ICSPCS.2009.5306423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a method for environmental audio events analysis. The audio events are modeled using a common universal codebook. The codebook is based on the bag-of-frames (BOF). The features corresponding to the frames and extracted from all audio files are grouped into clusters using the k-means algorithm. The individual audio file is modeled on the normalized distribution of the numbers of cluster bins corresponding to the frames of this file. Each audio file is described by one vector. The audio data are represented as feature-file matrix similar to term-document representation in Latent Semantic Indexing (LSI). The LSI is applied to the feature-file matrix to represent the data in latent semantic space. Then the primary file description is converted to the vectors of similarity to anchor reference data. For anchor reference the training data are used. Each component of this vector is a probabilistic similarity between target file and anchor reference file corresponding to the considered component. The LSI is applied once more to the new feature-file matrix, mapping the data to the latent semantic space in the anchor reference space. For audio recognition and audio retrieval the nearest-neighbor (NN) algorithm is exploited. The described data representation improves the results of audio retrieval and recognition.\",\"PeriodicalId\":356711,\"journal\":{\"name\":\"2009 3rd International Conference on Signal Processing and Communication Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 3rd International Conference on Signal Processing and Communication Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPCS.2009.5306423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 3rd International Conference on Signal Processing and Communication Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPCS.2009.5306423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper describes a method for environmental audio events analysis. The audio events are modeled using a common universal codebook. The codebook is based on the bag-of-frames (BOF). The features corresponding to the frames and extracted from all audio files are grouped into clusters using the k-means algorithm. The individual audio file is modeled on the normalized distribution of the numbers of cluster bins corresponding to the frames of this file. Each audio file is described by one vector. The audio data are represented as feature-file matrix similar to term-document representation in Latent Semantic Indexing (LSI). The LSI is applied to the feature-file matrix to represent the data in latent semantic space. Then the primary file description is converted to the vectors of similarity to anchor reference data. For anchor reference the training data are used. Each component of this vector is a probabilistic similarity between target file and anchor reference file corresponding to the considered component. The LSI is applied once more to the new feature-file matrix, mapping the data to the latent semantic space in the anchor reference space. For audio recognition and audio retrieval the nearest-neighbor (NN) algorithm is exploited. The described data representation improves the results of audio retrieval and recognition.