{"title":"深度音频嵌入和基于注意的音乐情感识别","authors":"S. Gupta","doi":"10.1109/DeSE58274.2023.10100058","DOIUrl":null,"url":null,"abstract":"The emotion is an intricated impression present in the music that is extremely hard to capture even using refined feature engineering techniques. The emotion of a song is an important feature that can be used for various MIR tasks like recommendation systems, music therapy, and automatic playlist generation. In this research, we investigate the application of L3- Net deep audio embeddings with the attention-based deep neural network model using positional encoding for recognizing musical emotions. In addition, we have constructed a master dataset using the 4Q audio emotion dataset and Bi-modal emotion dataset which is used in this research as the main dataset. The L3-Net deep audio embeddings are being used as features for the neural network model that does not require any feature engineering and other audio-based features. We have proposed two attention-based neural network models with and without recurrent layers. The positional encoding mechanism has helped the ACNN model to learn the recurrent information in the audio embeddings without any recurrent layers. Therefore, we conclude that the ACNN model has performed better than other models with the F1-score of 0.79 using the AdamP optimizer.","PeriodicalId":346847,"journal":{"name":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deep Audio Embeddings and Attention Based Music Emotion Recognition\",\"authors\":\"S. Gupta\",\"doi\":\"10.1109/DeSE58274.2023.10100058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emotion is an intricated impression present in the music that is extremely hard to capture even using refined feature engineering techniques. The emotion of a song is an important feature that can be used for various MIR tasks like recommendation systems, music therapy, and automatic playlist generation. In this research, we investigate the application of L3- Net deep audio embeddings with the attention-based deep neural network model using positional encoding for recognizing musical emotions. In addition, we have constructed a master dataset using the 4Q audio emotion dataset and Bi-modal emotion dataset which is used in this research as the main dataset. The L3-Net deep audio embeddings are being used as features for the neural network model that does not require any feature engineering and other audio-based features. We have proposed two attention-based neural network models with and without recurrent layers. The positional encoding mechanism has helped the ACNN model to learn the recurrent information in the audio embeddings without any recurrent layers. Therefore, we conclude that the ACNN model has performed better than other models with the F1-score of 0.79 using the AdamP optimizer.\",\"PeriodicalId\":346847,\"journal\":{\"name\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DeSE58274.2023.10100058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DeSE58274.2023.10100058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Audio Embeddings and Attention Based Music Emotion Recognition
The emotion is an intricated impression present in the music that is extremely hard to capture even using refined feature engineering techniques. The emotion of a song is an important feature that can be used for various MIR tasks like recommendation systems, music therapy, and automatic playlist generation. In this research, we investigate the application of L3- Net deep audio embeddings with the attention-based deep neural network model using positional encoding for recognizing musical emotions. In addition, we have constructed a master dataset using the 4Q audio emotion dataset and Bi-modal emotion dataset which is used in this research as the main dataset. The L3-Net deep audio embeddings are being used as features for the neural network model that does not require any feature engineering and other audio-based features. We have proposed two attention-based neural network models with and without recurrent layers. The positional encoding mechanism has helped the ACNN model to learn the recurrent information in the audio embeddings without any recurrent layers. Therefore, we conclude that the ACNN model has performed better than other models with the F1-score of 0.79 using the AdamP optimizer.