Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00034
Lei Gao, L. Guan
In this paper, a novel correntropy analysis (CORA) method is proposed for multi-view feature representation. By joint utilization the correntropy and nonlinear kernel transformation tools, the presented CORA method is able to measure the localized similarity between two random variables and further reveal the intrinsic relation between them effectively, leading to a high quality feature representation. Unlike many existing techniques for feature representation such as canonical correlation analysis (CCA) and kernel CCA (KCCA), CORA indicates and explores the mutual relation of two random variables according to the probability density. In addition, different from the kernel entropy component analysis (KECA) method revealing the structural information only from a single data space, CORA is able to explore the mutual structural information between two data spaces jointly instead. The effectiveness of the proposed method is evaluated through experiments on audio emotion recognition and face recognition examples. Comparisons are conducted on the statistics machine learning (SML) and deep neural network (DNN) based algorithms. The results show that the proposed CORA method outperforms other methods.
{"title":"A Novel Correntropy Analysis Method with Application to Multi-view Feature Representation","authors":"Lei Gao, L. Guan","doi":"10.1109/MIPR51284.2021.00034","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00034","url":null,"abstract":"In this paper, a novel correntropy analysis (CORA) method is proposed for multi-view feature representation. By joint utilization the correntropy and nonlinear kernel transformation tools, the presented CORA method is able to measure the localized similarity between two random variables and further reveal the intrinsic relation between them effectively, leading to a high quality feature representation. Unlike many existing techniques for feature representation such as canonical correlation analysis (CCA) and kernel CCA (KCCA), CORA indicates and explores the mutual relation of two random variables according to the probability density. In addition, different from the kernel entropy component analysis (KECA) method revealing the structural information only from a single data space, CORA is able to explore the mutual structural information between two data spaces jointly instead. The effectiveness of the proposed method is evaluated through experiments on audio emotion recognition and face recognition examples. Comparisons are conducted on the statistics machine learning (SML) and deep neural network (DNN) based algorithms. The results show that the proposed CORA method outperforms other methods.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115946213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00074
Fabian Kilger, Alexandre Kabil, Volker Tippmann, G. Klinker, Marc-Oliver Pahl
Virtualized collaboration can significantly increase remote management of critical infrastructures. Crises such as the current COVID-19 pandemic push the technology: they require remote management to keep our infrastructures running. Mixed Reality (MR) prototypes enable remote management in diverse fields such as medicine, industry 4.0, energy systems, education, or cyber awareness. However, the evolution of virtualized collaboration is still in the beginning. By design, MR is fake: its reality is generated from models. This makes detecting attacks very difficult. Many MR-attacks result from well-known cybersecurity threats. This paper identifies classic attack surfaces, vectors, and concrete threats that are relevant for MR. It presents mitigation methods that can help to secure the underlying data exchanges. However, distributed systems are often heterogeneous and under different management authorities, making securing the entire virtualized remote management stack difficult. The paper therefore also introduces considerations towards an MR-client-based attack detection, i.e., MR-forensics, including relevant features and the use of machine learning.
{"title":"Detecting and Preventing Faked Mixed Reality","authors":"Fabian Kilger, Alexandre Kabil, Volker Tippmann, G. Klinker, Marc-Oliver Pahl","doi":"10.1109/MIPR51284.2021.00074","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00074","url":null,"abstract":"Virtualized collaboration can significantly increase remote management of critical infrastructures. Crises such as the current COVID-19 pandemic push the technology: they require remote management to keep our infrastructures running. Mixed Reality (MR) prototypes enable remote management in diverse fields such as medicine, industry 4.0, energy systems, education, or cyber awareness. However, the evolution of virtualized collaboration is still in the beginning. By design, MR is fake: its reality is generated from models. This makes detecting attacks very difficult. Many MR-attacks result from well-known cybersecurity threats. This paper identifies classic attack surfaces, vectors, and concrete threats that are relevant for MR. It presents mitigation methods that can help to secure the underlying data exchanges. However, distributed systems are often heterogeneous and under different management authorities, making securing the entire virtualized remote management stack difficult. The paper therefore also introduces considerations towards an MR-client-based attack detection, i.e., MR-forensics, including relevant features and the use of machine learning.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124203889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional Chinese music is a great treasure for China and the rest of the world, which is accompanied by variant traditional musical instruments and featured with distinct melodies within different dynasties. Developing an efficient music retrieval system for traditional Chinese music requires numerous such music data with rich and accurate annotations. However, existing databases usually consider popular and contemporary music, basic taxonomy, and a single task. In this work, we introduce the JinYue database of more than 1000 pieces of music played by variants of huqin (huqin music) spanning the age range of the 20th century to date. The database includes over 10,000 annotations of huqin music in terms of discrete emotion, scene, and imagery labels. We provide extensive benchmarks of multi-class classification results for emotion, scene, and imagery along with the database. Furthermore, due to the copyright, we develop a JinYue Music Exploring System to provide the information of over 1,000 pieces of music played by huqin, including huqin music metadata, audio features, and annotations. We will continuously collect more music by Chinese musical instruments categories to enrich the JinYue database. This database aims to push forward the research in affective computing, music information retrieval, and beyond.
{"title":"The JinYue Database for Huqin Music Emotion, Scene and Imagery Recognition","authors":"Kejun Zhang, Xinda Wu, Ruiyuan Tang, Qiaoqiao Huang, Chang-yuan Yang, Hui Zhang","doi":"10.1109/MIPR51284.2021.00059","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00059","url":null,"abstract":"Traditional Chinese music is a great treasure for China and the rest of the world, which is accompanied by variant traditional musical instruments and featured with distinct melodies within different dynasties. Developing an efficient music retrieval system for traditional Chinese music requires numerous such music data with rich and accurate annotations. However, existing databases usually consider popular and contemporary music, basic taxonomy, and a single task. In this work, we introduce the JinYue database of more than 1000 pieces of music played by variants of huqin (huqin music) spanning the age range of the 20th century to date. The database includes over 10,000 annotations of huqin music in terms of discrete emotion, scene, and imagery labels. We provide extensive benchmarks of multi-class classification results for emotion, scene, and imagery along with the database. Furthermore, due to the copyright, we develop a JinYue Music Exploring System to provide the information of over 1,000 pieces of music played by huqin, including huqin music metadata, audio features, and annotations. We will continuously collect more music by Chinese musical instruments categories to enrich the JinYue database. This database aims to push forward the research in affective computing, music information retrieval, and beyond.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122712660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00055
Mantaro Yamada, Xueting Wang, T. Yamasaki
In this work, we analyze the preference features of shopping malls’ followers by examining their "following" and "like" behavior on Twitter. The analysis reveals their preferred topics and the differences among shopping malls that can be used for beneficial commercial applications such as effective promotion, marketing, or branding strategy. In addition, we propose a follower-oriented keyword recommendation method that leverages the followers’ preference. The method recommends keywords to use in a tweet to enhance popularity with the followers. It more directly helps shopping malls to use Twitter effectively for commercial applications.
{"title":"Preference Analysis of Shopping Malls’ Followers and Keyword Recommendation on Twitter","authors":"Mantaro Yamada, Xueting Wang, T. Yamasaki","doi":"10.1109/MIPR51284.2021.00055","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00055","url":null,"abstract":"In this work, we analyze the preference features of shopping malls’ followers by examining their \"following\" and \"like\" behavior on Twitter. The analysis reveals their preferred topics and the differences among shopping malls that can be used for beneficial commercial applications such as effective promotion, marketing, or branding strategy. In addition, we propose a follower-oriented keyword recommendation method that leverages the followers’ preference. The method recommends keywords to use in a tweet to enhance popularity with the followers. It more directly helps shopping malls to use Twitter effectively for commercial applications.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"13 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125064616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00060
E. Pearlman
OpenAI created the algorithm GPT-(Generative Pretrained Transformer 2) (now GPT-3) in February 2019. The algorithm creates imitations of human dialogue producing fake but surprisingly realistic interactions. Using GPT-2, a ‘sicko’ AI was created as a live time entity running in the Google cloud. AIBO (Artificial Intelligent Brainwave Opera) was one of two characters, the other being a human wearing a brain computer interface, both part an emotionally intelligent artificial intelligent brainwave opera. The opera asked two questions - "Can an AI be fascist?" and "Can an AI have epigenetic, or inherited traumatic memory?" This paper discusses aspects involved in building the GPT-2 cloud-based character AIBO and its synthetic emotions in a performative spoken word opera.
{"title":"AIBO – A Sicko AI Brainwave Opera","authors":"E. Pearlman","doi":"10.1109/MIPR51284.2021.00060","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00060","url":null,"abstract":"OpenAI created the algorithm GPT-(Generative Pretrained Transformer 2) (now GPT-3) in February 2019. The algorithm creates imitations of human dialogue producing fake but surprisingly realistic interactions. Using GPT-2, a ‘sicko’ AI was created as a live time entity running in the Google cloud. AIBO (Artificial Intelligent Brainwave Opera) was one of two characters, the other being a human wearing a brain computer interface, both part an emotionally intelligent artificial intelligent brainwave opera. The opera asked two questions - \"Can an AI be fascist?\" and \"Can an AI have epigenetic, or inherited traumatic memory?\" This paper discusses aspects involved in building the GPT-2 cloud-based character AIBO and its synthetic emotions in a performative spoken word opera.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123568630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00024
Peng-Yuan Kao, S. Shih, Y. Hung, Aye Mon Tun
Structured-light RGB-D cameras have been widely used in various applications. However, due to the deformation of internal camera parts, their depth estimation accuracy degrades with time. While it is easy to calibrate the camera parameters, updating the calibrated parameters to the camera firmware is difficult. Therefore, existing methods compensate for the depth measurements with different error correction functions. At present, as there are no simple and accurate parametric error correction methods, non-parametric calibration methods must be used when accurate depth measurements are required. The main drawback of such nonparametric approaches is that they require a large number of calibration images to calibrate a large error correction lookup tables. In this paper, we propose a simple parametric depth error correction model based on Taylor-series approximation of depth measurement equations. Experimental results show that the proposed method outperforms other parametric approaches and achieves results comparable to the state-of-the-art nonparametric method although the proposed method uses only nine parameters.
{"title":"Recalibration of Structured-Light RGB-D Cameras with Parametric Depth Error Correction","authors":"Peng-Yuan Kao, S. Shih, Y. Hung, Aye Mon Tun","doi":"10.1109/MIPR51284.2021.00024","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00024","url":null,"abstract":"Structured-light RGB-D cameras have been widely used in various applications. However, due to the deformation of internal camera parts, their depth estimation accuracy degrades with time. While it is easy to calibrate the camera parameters, updating the calibrated parameters to the camera firmware is difficult. Therefore, existing methods compensate for the depth measurements with different error correction functions. At present, as there are no simple and accurate parametric error correction methods, non-parametric calibration methods must be used when accurate depth measurements are required. The main drawback of such nonparametric approaches is that they require a large number of calibration images to calibrate a large error correction lookup tables. In this paper, we propose a simple parametric depth error correction model based on Taylor-series approximation of depth measurement equations. Experimental results show that the proposed method outperforms other parametric approaches and achieves results comparable to the state-of-the-art nonparametric method although the proposed method uses only nine parameters.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126151015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00009
Maria Presa-Reyes, Shu‐Ching Chen
Not only does the destruction caused by natural disasters impair human lives, but it can also result in devastating damages to the community infrastructure and possibly cause the loss of historic structures as well as vital documents. Technological advances in remote sensing survey tools such as satellite images and aerial photographs have allowed emergency responders to rapidly and remotely conduct a comprehensive assessment of the damages caused by a disaster event. Most of the previously proposed research in the automatic identification and prediction of building damage assessments from optical remote sensing data depends on the availability of accurate geometric footprints of the affected area’s structures. However, the available building footprints may rapidly become outdated as new infrastructures are built while old ones are demolished or renovated. We propose an end-to-end weakly-supervised damage assessment model where the assumption is that the building footprint is unknown during training. Instead, there is a rough estimate of the building’s location and the level of damage it sustained. Ablation tests are conducted on both a large-scale satellite imagery set and a smaller set of aerial photographs prepared and curated by our team to demonstrate our proposed model’s performance.
{"title":"Weakly-Supervised Damaged Building Localization and Assessment with Noise Regularization","authors":"Maria Presa-Reyes, Shu‐Ching Chen","doi":"10.1109/MIPR51284.2021.00009","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00009","url":null,"abstract":"Not only does the destruction caused by natural disasters impair human lives, but it can also result in devastating damages to the community infrastructure and possibly cause the loss of historic structures as well as vital documents. Technological advances in remote sensing survey tools such as satellite images and aerial photographs have allowed emergency responders to rapidly and remotely conduct a comprehensive assessment of the damages caused by a disaster event. Most of the previously proposed research in the automatic identification and prediction of building damage assessments from optical remote sensing data depends on the availability of accurate geometric footprints of the affected area’s structures. However, the available building footprints may rapidly become outdated as new infrastructures are built while old ones are demolished or renovated. We propose an end-to-end weakly-supervised damage assessment model where the assumption is that the building footprint is unknown during training. Instead, there is a rough estimate of the building’s location and the level of damage it sustained. Ablation tests are conducted on both a large-scale satellite imagery set and a smaller set of aerial photographs prepared and curated by our team to demonstrate our proposed model’s performance.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128408918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00026
Yushu Liu, Weigang Zhang, Guorong Li, Li Su, Qingming Huang
This paper tackles the problem of example-driven weakly-supervised temporal action localization. We propose the One-shot Example Videos Localization Network (OSEVLNet) for precisely localizing the action instances in untrimmed videos with only one trimmed example video. Since the frame-level ground truth is unavailable under weakly-supervised settings, our approach automatically trains a self-attention module with reconstruction and feature discrepancy restriction. Specifically, the reconstruction restriction minimizes the discrepancy between the original input features and the reconstructed features of a Variational AutoEncoder (VAE) module. The feature discrepancy restriction maximizes the distance of weighted features between highly-responsive regions and slightly-responsive regions. Our approach achieves comparable or better results on THUMOS’14 dataset than other weakly-supervised methods while it is trained with much less videos. Moreover, our approach is especially suitable for the expansion of newly emerging action categories to meet the requirements of different occasions.
{"title":"One-Shot Example Videos Localization Network for Weakly-Supervised Temporal Action Localization","authors":"Yushu Liu, Weigang Zhang, Guorong Li, Li Su, Qingming Huang","doi":"10.1109/MIPR51284.2021.00026","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00026","url":null,"abstract":"This paper tackles the problem of example-driven weakly-supervised temporal action localization. We propose the One-shot Example Videos Localization Network (OSEVLNet) for precisely localizing the action instances in untrimmed videos with only one trimmed example video. Since the frame-level ground truth is unavailable under weakly-supervised settings, our approach automatically trains a self-attention module with reconstruction and feature discrepancy restriction. Specifically, the reconstruction restriction minimizes the discrepancy between the original input features and the reconstructed features of a Variational AutoEncoder (VAE) module. The feature discrepancy restriction maximizes the distance of weighted features between highly-responsive regions and slightly-responsive regions. Our approach achieves comparable or better results on THUMOS’14 dataset than other weakly-supervised methods while it is trained with much less videos. Moreover, our approach is especially suitable for the expansion of newly emerging action categories to meet the requirements of different occasions.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130436496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1109/MIPR51284.2021.00066
Hongwei Li, Hongjian Bo, Lin Ma, Lexiang Wang, Haifeng Li
For centuries, music has been an important part of various cultures and a special language for humans to express their thoughts and emotions. Music emotion plays an important role in music retrieval, mood detection and other music-related applications. Music emotion recognition (MER) has become a research hotspot in the world. The traditional music emotion recognition ignores that the subject of emotions is human. Music acts on the brain to finally produce emotions. Therefore, this paper studies the mapping relationship between music features and EEG features. Through the sparse canonical correlation method, the music features are projected onto the EEG features to obtain the new music feature vectors containing EEG information. The support vector machine was used to train and test the new music feature vectors, and good recognition results were obtained in both the self-built database and the public database. The method proposed in this paper combines the advantages of EEG signals that can reflect the most intuitive and accurate emotional expression. At the same time, our method has good transferability. When the EEG samples are representative, the projection vector is universal and can be directly used in other music database.
{"title":"Music Emotion Recognition through Sparse Canonical Correlation Analysis","authors":"Hongwei Li, Hongjian Bo, Lin Ma, Lexiang Wang, Haifeng Li","doi":"10.1109/MIPR51284.2021.00066","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00066","url":null,"abstract":"For centuries, music has been an important part of various cultures and a special language for humans to express their thoughts and emotions. Music emotion plays an important role in music retrieval, mood detection and other music-related applications. Music emotion recognition (MER) has become a research hotspot in the world. The traditional music emotion recognition ignores that the subject of emotions is human. Music acts on the brain to finally produce emotions. Therefore, this paper studies the mapping relationship between music features and EEG features. Through the sparse canonical correlation method, the music features are projected onto the EEG features to obtain the new music feature vectors containing EEG information. The support vector machine was used to train and test the new music feature vectors, and good recognition results were obtained in both the self-built database and the public database. The method proposed in this paper combines the advantages of EEG signals that can reflect the most intuitive and accurate emotional expression. At the same time, our method has good transferability. When the EEG samples are representative, the projection vector is universal and can be directly used in other music database.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133902050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Music is universally recognized as an effective way for human to express emotion and regulate emotional states. But perceived music emotion is subjective and much dependent on culture, environment, and life experience. Therefore, personalized music recommendation is necessary to gain user satisfaction and navigate a listener to a more positive emotional state as well. Existing work on emotion- based music recommendation and personalized music recommendation often lack of considering the impact of past life experiences on music emotion perceiving. We argue that memories associated with music could play a vital role in determining the new emotional states after music listening. To verify our hypothesis, we propose a personalized music recommendation framework called MemoMusic, which estimates the new emotional state of a listener based on an individual’s current emotional state and possible memory associated with the music being listened to. For the preliminary experiment, a dataset of 60 piano music was collected and labelled using the Valence-Arousal model from three categories of Classical, Popular, and Yanni music. Experimental results demonstrate that memory is actually an important factor in determining perceived music emotion. And MemoMusic based on emotion and memory achieves a good performance in terms of improving a listener’s emotional states.
{"title":"MemoMusic: A Personalized Music Recommendation Framework Based on Emotion and Memory","authors":"Luntian Mou, Jueying Li, Juehui Li, Feng Gao, Ramesh C. Jain, Baocai Yin","doi":"10.1109/MIPR51284.2021.00064","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00064","url":null,"abstract":"Music is universally recognized as an effective way for human to express emotion and regulate emotional states. But perceived music emotion is subjective and much dependent on culture, environment, and life experience. Therefore, personalized music recommendation is necessary to gain user satisfaction and navigate a listener to a more positive emotional state as well. Existing work on emotion- based music recommendation and personalized music recommendation often lack of considering the impact of past life experiences on music emotion perceiving. We argue that memories associated with music could play a vital role in determining the new emotional states after music listening. To verify our hypothesis, we propose a personalized music recommendation framework called MemoMusic, which estimates the new emotional state of a listener based on an individual’s current emotional state and possible memory associated with the music being listened to. For the preliminary experiment, a dataset of 60 piano music was collected and labelled using the Valence-Arousal model from three categories of Classical, Popular, and Yanni music. Experimental results demonstrate that memory is actually an important factor in determining perceived music emotion. And MemoMusic based on emotion and memory achieves a good performance in terms of improving a listener’s emotional states.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132906599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}