{"title":"通用音频相似性度量的无监督锚点空间生成","authors":"Lie Lu, A. Hanjalic","doi":"10.1109/ICASSP.2008.4517544","DOIUrl":null,"url":null,"abstract":"Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Unsupervised anchor space generation for similarity measurement of general audio\",\"authors\":\"Lie Lu, A. Hanjalic\",\"doi\":\"10.1109/ICASSP.2008.4517544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.\",\"PeriodicalId\":333742,\"journal\":{\"name\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2008.4517544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2008.4517544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised anchor space generation for similarity measurement of general audio
Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.