{"title":"Evaluation of Auditory Saliency Model Based on Saliency Map","authors":"Xiansong Xiong, Zhijun Zhao, Lingyun Xie","doi":"10.1109/CoST57098.2022.00061","DOIUrl":null,"url":null,"abstract":"For the bottom-up auditory attention process, many auditory attention models have been proposed, including the earliest four auditory saliency models developed from visual saliency models, namely Kayser model, Kalinli model, Duangudom model and Kaya model. In order to compare the correlation between the output results of the four models and subjective perception, firstly the four models were evaluated by carrying out a subjective saliency evaluation experiment in this paper. In the subjective evaluation experiment, 20 kinds of sound scene materials were scored with relative saliency and absolute saliency, and two rankings were obtained. Secondly in the saliency model, the saliency scores were calculated for the same 20 kinds of sounds, and the saliency of the sounds were scored by extracting the mean, peak, variance and dynamic characteristics of the saliency score of each sound, and then correlations were calculated between model saliency scores and two subjective scores. The conclusion was that Kalinli model had the best effect among the four models and had the highest correlation with subjective perception; among the four features of the saliency score, the variance had the highest correlation with subjective perception. The main reason for the better results of Kalinli model was that the method of extracting auditory spectrograms and features was more consistent with the auditory characteristics of human ear and the extracted features were more comprehensive. By analyzing the structure and perceptual features of the models with high correlation between model output and subjective perception, we can improve the models in the future based on the conclusions drawn, so as to enhance their performance and make them more consistent with the auditory characteristics of the human ear.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoST57098.2022.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
For the bottom-up auditory attention process, many auditory attention models have been proposed, including the earliest four auditory saliency models developed from visual saliency models, namely Kayser model, Kalinli model, Duangudom model and Kaya model. In order to compare the correlation between the output results of the four models and subjective perception, firstly the four models were evaluated by carrying out a subjective saliency evaluation experiment in this paper. In the subjective evaluation experiment, 20 kinds of sound scene materials were scored with relative saliency and absolute saliency, and two rankings were obtained. Secondly in the saliency model, the saliency scores were calculated for the same 20 kinds of sounds, and the saliency of the sounds were scored by extracting the mean, peak, variance and dynamic characteristics of the saliency score of each sound, and then correlations were calculated between model saliency scores and two subjective scores. The conclusion was that Kalinli model had the best effect among the four models and had the highest correlation with subjective perception; among the four features of the saliency score, the variance had the highest correlation with subjective perception. The main reason for the better results of Kalinli model was that the method of extracting auditory spectrograms and features was more consistent with the auditory characteristics of human ear and the extracted features were more comprehensive. By analyzing the structure and perceptual features of the models with high correlation between model output and subjective perception, we can improve the models in the future based on the conclusions drawn, so as to enhance their performance and make them more consistent with the auditory characteristics of the human ear.