{"title":"角边际软最大损失及其变体双压缩AMR音频检测","authors":"Aykut Büker, C. Hanilçi","doi":"10.1145/3437880.3460414","DOIUrl":null,"url":null,"abstract":"Double compressed (DC) adaptive multi-rate (AMR) audio detection is an important but challenging audio forensic task which has received great attention over the last decade. Although the majority of the existing studies extract hand-crafted features and classify these features using traditional pattern matching algorithms such as support vector machines (SVM), recently convolutional neural network (CNN) based DC AMR audio detection system was proposed which yields very promising detection performance. Similar to any traditional CNN based classification system, CNN based DC AMR recognition system uses standard softmax loss as the training criterion. In this paper, we propose to use angular margin softmax loss and its variants for DC AMR detection problem. Although using angular margin softmax was originally proposed for face recognition, we adapt it to the CNN based end-to-end DC audio detection system. The angular margin softmax basically introduces a margin between two classes so that the system can learn more discriminative embeddings for the problem. Experimental results show that adding angular margin penalty to the traditional softmax loss increases the average DC AMR audio detection from 95.83% to 100%. It is also found that the angular margin softmax loss functions boost the DC AMR audio detection performance when there is a mismatch between training and test datasets.","PeriodicalId":120300,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Angular Margin Softmax Loss and Its Variants for Double Compressed AMR Audio Detection\",\"authors\":\"Aykut Büker, C. Hanilçi\",\"doi\":\"10.1145/3437880.3460414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Double compressed (DC) adaptive multi-rate (AMR) audio detection is an important but challenging audio forensic task which has received great attention over the last decade. Although the majority of the existing studies extract hand-crafted features and classify these features using traditional pattern matching algorithms such as support vector machines (SVM), recently convolutional neural network (CNN) based DC AMR audio detection system was proposed which yields very promising detection performance. Similar to any traditional CNN based classification system, CNN based DC AMR recognition system uses standard softmax loss as the training criterion. In this paper, we propose to use angular margin softmax loss and its variants for DC AMR detection problem. Although using angular margin softmax was originally proposed for face recognition, we adapt it to the CNN based end-to-end DC audio detection system. The angular margin softmax basically introduces a margin between two classes so that the system can learn more discriminative embeddings for the problem. Experimental results show that adding angular margin penalty to the traditional softmax loss increases the average DC AMR audio detection from 95.83% to 100%. It is also found that the angular margin softmax loss functions boost the DC AMR audio detection performance when there is a mismatch between training and test datasets.\",\"PeriodicalId\":120300,\"journal\":{\"name\":\"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3437880.3460414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437880.3460414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Angular Margin Softmax Loss and Its Variants for Double Compressed AMR Audio Detection
Double compressed (DC) adaptive multi-rate (AMR) audio detection is an important but challenging audio forensic task which has received great attention over the last decade. Although the majority of the existing studies extract hand-crafted features and classify these features using traditional pattern matching algorithms such as support vector machines (SVM), recently convolutional neural network (CNN) based DC AMR audio detection system was proposed which yields very promising detection performance. Similar to any traditional CNN based classification system, CNN based DC AMR recognition system uses standard softmax loss as the training criterion. In this paper, we propose to use angular margin softmax loss and its variants for DC AMR detection problem. Although using angular margin softmax was originally proposed for face recognition, we adapt it to the CNN based end-to-end DC audio detection system. The angular margin softmax basically introduces a margin between two classes so that the system can learn more discriminative embeddings for the problem. Experimental results show that adding angular margin penalty to the traditional softmax loss increases the average DC AMR audio detection from 95.83% to 100%. It is also found that the angular margin softmax loss functions boost the DC AMR audio detection performance when there is a mismatch between training and test datasets.