Image generation technology, which is often used in various applications of intelligent image generation, can learn the feature distribution of real images and sample from the distribution to obtain the generated images with high fidelity. This paper focuses on the feature extraction and intelligent generation techniques of Peking opera face with Chinese cultural characteristics. Based on the creation of a Peking opera face dataset, this paper compares the impact of different variants of a Style-based generator architecture for Generative Adversarial Networks (StyleGAN2) and different sizes of datasets on the quality of face generation. The experimental results verify that the synthetic images generated by StyleGAN2 with the addition of the Adaptive Discriminator Augmentation (ADA) module are visually better and have good local randomness when the dataset is small and unbalanced in distribution.
{"title":"Automatic Image Generation of Peking Opera Face using StyleGAN2","authors":"Xiaoyu Xin, Yinghua Shen, Rui Xiong, Xiahan Lin, Ming Yan, Wei Jiang","doi":"10.1109/CoST57098.2022.00030","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00030","url":null,"abstract":"Image generation technology, which is often used in various applications of intelligent image generation, can learn the feature distribution of real images and sample from the distribution to obtain the generated images with high fidelity. This paper focuses on the feature extraction and intelligent generation techniques of Peking opera face with Chinese cultural characteristics. Based on the creation of a Peking opera face dataset, this paper compares the impact of different variants of a Style-based generator architecture for Generative Adversarial Networks (StyleGAN2) and different sizes of datasets on the quality of face generation. The experimental results verify that the synthetic images generated by StyleGAN2 with the addition of the Adaptive Discriminator Augmentation (ADA) module are visually better and have good local randomness when the dataset is small and unbalanced in distribution.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114112493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/cost57098.2022.00032
Xinyu Jiang, Jiangbo Xu, Ruoyu Zou
With the development and update of electronic equipment, image quality assessment has become one of the hot topics. Recently, digital image processing and convolutional neural networks (CNN) have made significant progress. However, the models based on human vision characteristics and neural feedback have poor performance in previous studies. Inspired by this, we propose a CNN-based network, vision enhancement network (VE-Net). It can filter images adaptively according to the key regions. Key regions are extracted with the incentive support method from deep information learned by CNN. The adaptive filter uses Laplacian filter and Gaussian filter. Laplacian filter adopts a linear lifting algorithm, aiming to attach the image texture to the original image. Squared earth mover’s distance (EMD) loss is selected to predict the image aesthetic score distribution. VE-Net is evaluated on AVA dataset for the regression task and the classification task. Experiments show the superiority of VE-Net.
{"title":"A Vision Enhancement Network for Image Quality Assessment","authors":"Xinyu Jiang, Jiangbo Xu, Ruoyu Zou","doi":"10.1109/cost57098.2022.00032","DOIUrl":"https://doi.org/10.1109/cost57098.2022.00032","url":null,"abstract":"With the development and update of electronic equipment, image quality assessment has become one of the hot topics. Recently, digital image processing and convolutional neural networks (CNN) have made significant progress. However, the models based on human vision characteristics and neural feedback have poor performance in previous studies. Inspired by this, we propose a CNN-based network, vision enhancement network (VE-Net). It can filter images adaptively according to the key regions. Key regions are extracted with the incentive support method from deep information learned by CNN. The adaptive filter uses Laplacian filter and Gaussian filter. Laplacian filter adopts a linear lifting algorithm, aiming to attach the image texture to the original image. Squared earth mover’s distance (EMD) loss is selected to predict the image aesthetic score distribution. VE-Net is evaluated on AVA dataset for the regression task and the classification task. Experiments show the superiority of VE-Net.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"188 5-6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114027193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/cost57098.2022.00020
Tianyue Jiang, Sanhong Deng, Peng Wu, Haibi Jiang
Music, as an important part of the culture, occupies a significant position and can be easily accessed. The research on the sentiment represented by music and its effect on the listener’s emotion is increasing gradually, but the existing research is often subjective and neglects the real-time expression of emotion. In this article, two labeled datasets are established. The deep learning method is used to classify music sentiment while the decision-level fusion method is used for real-time listener multimodal sentiment. We combine the sentiment analysis with a traditional online music playback system and propose innovatively a human-music emotional interaction system, using multimodal sentiment analysis based on the deep learning method. By means of individual observation and questionnaire survey, the interaction between human-music sentiments is proved to have a positive influence on listeners’ negative emotions.
{"title":"Real-time Human-Music Emotional Interaction Based on Multimodal Analysis","authors":"Tianyue Jiang, Sanhong Deng, Peng Wu, Haibi Jiang","doi":"10.1109/cost57098.2022.00020","DOIUrl":"https://doi.org/10.1109/cost57098.2022.00020","url":null,"abstract":"Music, as an important part of the culture, occupies a significant position and can be easily accessed. The research on the sentiment represented by music and its effect on the listener’s emotion is increasing gradually, but the existing research is often subjective and neglects the real-time expression of emotion. In this article, two labeled datasets are established. The deep learning method is used to classify music sentiment while the decision-level fusion method is used for real-time listener multimodal sentiment. We combine the sentiment analysis with a traditional online music playback system and propose innovatively a human-music emotional interaction system, using multimodal sentiment analysis based on the deep learning method. By means of individual observation and questionnaire survey, the interaction between human-music sentiments is proved to have a positive influence on listeners’ negative emotions.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122735443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/cost57098.2022.00047
Jinchu Zhou, Ying Wang, Bo Li
Under the background of The Belt and Road Chinese movies face a dual context of coexistence of opportunities and difficulties in international communication. To explore the feasible method of Chinese movie’s external diffusion, we take 36 countries along the “Belt and Road” from 2017.12 to 2020.12 as the research object and use the Latent Dirichlet allocation topic model(LDA) to summarize six categories of topics. According to the box office distribution of each topic, all countries are divided into six categories, and we discuss the topic preferences of each country separately. To promote movie diffusion, for regions with significant cultural differences, the movie should be carried out according to their preferred topics; for regions with better development of the movie market but less cooperation, we should output high-quality local movies to carry out cooperation shooting at the same time; for regions with deep cultural exchanges and a weak economy, we should output movies with diverse culture and preaching national spirit. In addition, it is necessary to consider multiple factors such as the political and religious background to avoid breaking group taboos and formulate export plans according to local conditions to achieve multi-ethnic and multicultural integration further.
{"title":"Research on the practical path of Chinese movie and television communication from the perspective of “The Belt and Road”","authors":"Jinchu Zhou, Ying Wang, Bo Li","doi":"10.1109/cost57098.2022.00047","DOIUrl":"https://doi.org/10.1109/cost57098.2022.00047","url":null,"abstract":"Under the background of The Belt and Road Chinese movies face a dual context of coexistence of opportunities and difficulties in international communication. To explore the feasible method of Chinese movie’s external diffusion, we take 36 countries along the “Belt and Road” from 2017.12 to 2020.12 as the research object and use the Latent Dirichlet allocation topic model(LDA) to summarize six categories of topics. According to the box office distribution of each topic, all countries are divided into six categories, and we discuss the topic preferences of each country separately. To promote movie diffusion, for regions with significant cultural differences, the movie should be carried out according to their preferred topics; for regions with better development of the movie market but less cooperation, we should output high-quality local movies to carry out cooperation shooting at the same time; for regions with deep cultural exchanges and a weak economy, we should output movies with diverse culture and preaching national spirit. In addition, it is necessary to consider multiple factors such as the political and religious background to avoid breaking group taboos and formulate export plans according to local conditions to achieve multi-ethnic and multicultural integration further.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122937014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/CoST57098.2022.00086
Xinran Ba, Libiao Jin, Zhou Li, Chang Liu, Sidong Li
5th generation mobile communication technology (5G) uses its extended capability, access traffic steering, switching, and splitting (ATSSS) to enable multipath transmission of data, which is currently being standardized. Attempts are being made in both academia and industry to study multi-access technologies based on the ATSSS function. This study follows the tenet of minimal modification of the existing 5G protocol data unit session management and builds a multi-access network architecture to achieve multi-path transmission of data. The results reveal that multi-access technology can significantly improve the average throughput of users and network load balancing.
{"title":"Performance Evaluation of Multi-Access Based on ATSSS Rules","authors":"Xinran Ba, Libiao Jin, Zhou Li, Chang Liu, Sidong Li","doi":"10.1109/CoST57098.2022.00086","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00086","url":null,"abstract":"5th generation mobile communication technology (5G) uses its extended capability, access traffic steering, switching, and splitting (ATSSS) to enable multipath transmission of data, which is currently being standardized. Attempts are being made in both academia and industry to study multi-access technologies based on the ATSSS function. This study follows the tenet of minimal modification of the existing 5G protocol data unit session management and builds a multi-access network architecture to achieve multi-path transmission of data. The results reveal that multi-access technology can significantly improve the average throughput of users and network load balancing.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121769148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/CoST57098.2022.00056
Shuo Lei, Siyi Tian, Qiming Huang, Anyi Huang
When watching video, people can roughly estimate the direction and displacement of the camera movement by the change of the two frames before and after the video. We have studied this phenomenon and quantified it using computer. In this paper, we compare the current main image feature point extraction techniques with search matching techniques and propose an algorithm that relies on camera images to calculate instantaneous displacement. The algorithm is well suited to guided robots that are generally equipped with cameras. Based on the feature point extraction and matching technology, the algorithm realizes the function of calculating the camera displacement between two samples through the homography matrix transformation and camera calibration. We have conducted several comprehensive experiments on the algorithm in multiple environments and analyzed the proposed algorithm in this paper based on the experimental results.
{"title":"A real-time localization algorithm based on feature point matching","authors":"Shuo Lei, Siyi Tian, Qiming Huang, Anyi Huang","doi":"10.1109/CoST57098.2022.00056","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00056","url":null,"abstract":"When watching video, people can roughly estimate the direction and displacement of the camera movement by the change of the two frames before and after the video. We have studied this phenomenon and quantified it using computer. In this paper, we compare the current main image feature point extraction techniques with search matching techniques and propose an algorithm that relies on camera images to calculate instantaneous displacement. The algorithm is well suited to guided robots that are generally equipped with cameras. Based on the feature point extraction and matching technology, the algorithm realizes the function of calculating the camera displacement between two samples through the homography matrix transformation and camera calibration. We have conducted several comprehensive experiments on the algorithm in multiple environments and analyzed the proposed algorithm in this paper based on the experimental results.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128112185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/CoST57098.2022.00061
Xiansong Xiong, Zhijun Zhao, Lingyun Xie
For the bottom-up auditory attention process, many auditory attention models have been proposed, including the earliest four auditory saliency models developed from visual saliency models, namely Kayser model, Kalinli model, Duangudom model and Kaya model. In order to compare the correlation between the output results of the four models and subjective perception, firstly the four models were evaluated by carrying out a subjective saliency evaluation experiment in this paper. In the subjective evaluation experiment, 20 kinds of sound scene materials were scored with relative saliency and absolute saliency, and two rankings were obtained. Secondly in the saliency model, the saliency scores were calculated for the same 20 kinds of sounds, and the saliency of the sounds were scored by extracting the mean, peak, variance and dynamic characteristics of the saliency score of each sound, and then correlations were calculated between model saliency scores and two subjective scores. The conclusion was that Kalinli model had the best effect among the four models and had the highest correlation with subjective perception; among the four features of the saliency score, the variance had the highest correlation with subjective perception. The main reason for the better results of Kalinli model was that the method of extracting auditory spectrograms and features was more consistent with the auditory characteristics of human ear and the extracted features were more comprehensive. By analyzing the structure and perceptual features of the models with high correlation between model output and subjective perception, we can improve the models in the future based on the conclusions drawn, so as to enhance their performance and make them more consistent with the auditory characteristics of the human ear.
{"title":"Evaluation of Auditory Saliency Model Based on Saliency Map","authors":"Xiansong Xiong, Zhijun Zhao, Lingyun Xie","doi":"10.1109/CoST57098.2022.00061","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00061","url":null,"abstract":"For the bottom-up auditory attention process, many auditory attention models have been proposed, including the earliest four auditory saliency models developed from visual saliency models, namely Kayser model, Kalinli model, Duangudom model and Kaya model. In order to compare the correlation between the output results of the four models and subjective perception, firstly the four models were evaluated by carrying out a subjective saliency evaluation experiment in this paper. In the subjective evaluation experiment, 20 kinds of sound scene materials were scored with relative saliency and absolute saliency, and two rankings were obtained. Secondly in the saliency model, the saliency scores were calculated for the same 20 kinds of sounds, and the saliency of the sounds were scored by extracting the mean, peak, variance and dynamic characteristics of the saliency score of each sound, and then correlations were calculated between model saliency scores and two subjective scores. The conclusion was that Kalinli model had the best effect among the four models and had the highest correlation with subjective perception; among the four features of the saliency score, the variance had the highest correlation with subjective perception. The main reason for the better results of Kalinli model was that the method of extracting auditory spectrograms and features was more consistent with the auditory characteristics of human ear and the extracted features were more comprehensive. By analyzing the structure and perceptual features of the models with high correlation between model output and subjective perception, we can improve the models in the future based on the conclusions drawn, so as to enhance their performance and make them more consistent with the auditory characteristics of the human ear.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133406105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/CoST57098.2022.00043
Yixin Tai, Yu Yang, Xiaotian Wang
Virtual Reality (VR) technology is considered to be an important technical support for Metaverse. However, the discomfort caused by VR, especially VR motion sickness, greatly affects the user experience. Therefore, it’s particularly important to study the comfort for VR. In this project, a motion sickness test platform based on Unreal Engine (UE) was developed to measure and improve the comfort for VR. In the platform, virtual three-dimensional scenes are created and the platform can adjust parameters like rotation angular velocity and axis, height of the virtual camera, quantities of black and white stripes. Rotation angular velocity and axis of the virtual camera is set to verify usability of the platform. It can be concluded that VR motion sickness to some extent is aggravated with angular velocity going up. It’s more intense when rotating around X and Y axes than when rotating around Z. And women are more likely to get VR motion sickness than men.
{"title":"Development of VR Motion Sickness Test Platform Based on UE","authors":"Yixin Tai, Yu Yang, Xiaotian Wang","doi":"10.1109/CoST57098.2022.00043","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00043","url":null,"abstract":"Virtual Reality (VR) technology is considered to be an important technical support for Metaverse. However, the discomfort caused by VR, especially VR motion sickness, greatly affects the user experience. Therefore, it’s particularly important to study the comfort for VR. In this project, a motion sickness test platform based on Unreal Engine (UE) was developed to measure and improve the comfort for VR. In the platform, virtual three-dimensional scenes are created and the platform can adjust parameters like rotation angular velocity and axis, height of the virtual camera, quantities of black and white stripes. Rotation angular velocity and axis of the virtual camera is set to verify usability of the platform. It can be concluded that VR motion sickness to some extent is aggravated with angular velocity going up. It’s more intense when rotating around X and Y axes than when rotating around Z. And women are more likely to get VR motion sickness than men.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133041215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.1109/CoST57098.2022.00083
Yingying Lv, Jingtao Wang, Wenbo Wu, Yun Pan
Bone age is the biological age that reflects the growth and development of human body. Bone age assessment has been applied and plays an important role in clinical medicine, sports science and justice. Reasonable convolution neural network (CNN) models can greatly improve the accuracy and efficiency of bone age assessment. By comparing various hand bone segmentation models trained by classical convolutional neural networks, we found that with intersection over inion (IoU) and dice similarity coefficient (Dice) as evaluation indexes, the segmentation model trained by U-Net had the best performance. Its IoU reached 0.9746, and its Dice reached 0.9871. This is contrary to our inherent recognition that the U-Net++ model is superior to the U-Net model. Based on the images segmented by U-Net, we applied five kinds of common convolutional neural networks to bone age prediction, with mean absolute error (MAE) and error accuracy within two years as evaluation indexes. The results showed that the MAE of Xception was 7.635 and the accuracy of errors within two years reached 97.59%. In this paper, we provide an optimal scheme for bone age image segmentation and bone age assessment, and provide a theoretical basis for the design of bone age assessment system.
骨龄是反映人体生长发育的生物年龄。骨龄评估在临床医学、体育科学和司法等领域都有广泛的应用和作用。合理的卷积神经网络(CNN)模型可以大大提高骨龄评估的准确性和效率。通过对比经典卷积神经网络训练的各种手骨分割模型,我们发现以交叉数(intersection over inion, IoU)和骰子相似系数(dice, dice)作为评价指标,U-Net训练的手骨分割模型表现最好。IoU为0.9746,Dice为0.9871。这与我们固有的认识相反,即unet++模型优于U-Net模型。在U-Net分割图像的基础上,应用5种常用卷积神经网络进行骨龄预测,以平均绝对误差(MAE)和2年内的误差精度为评价指标。结果表明:异常的MAE为7.635,2年内误差的准确率达到97.59%。本文提出了一种骨年龄图像分割和骨年龄评估的优化方案,为骨年龄评估系统的设计提供了理论依据。
{"title":"Performance comparison of deep learning methods on hand bone segmentation and bone age assessment","authors":"Yingying Lv, Jingtao Wang, Wenbo Wu, Yun Pan","doi":"10.1109/CoST57098.2022.00083","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00083","url":null,"abstract":"Bone age is the biological age that reflects the growth and development of human body. Bone age assessment has been applied and plays an important role in clinical medicine, sports science and justice. Reasonable convolution neural network (CNN) models can greatly improve the accuracy and efficiency of bone age assessment. By comparing various hand bone segmentation models trained by classical convolutional neural networks, we found that with intersection over inion (IoU) and dice similarity coefficient (Dice) as evaluation indexes, the segmentation model trained by U-Net had the best performance. Its IoU reached 0.9746, and its Dice reached 0.9871. This is contrary to our inherent recognition that the U-Net++ model is superior to the U-Net model. Based on the images segmented by U-Net, we applied five kinds of common convolutional neural networks to bone age prediction, with mean absolute error (MAE) and error accuracy within two years as evaluation indexes. The results showed that the MAE of Xception was 7.635 and the accuracy of errors within two years reached 97.59%. In this paper, we provide an optimal scheme for bone age image segmentation and bone age assessment, and provide a theoretical basis for the design of bone age assessment system.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114734140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time continuous emotion prediction problem has always been one of the difficulties in affective video content analysis. The current research mainly designs a temporally continuous long video emotion prediction method by dividing the long video into short video segments of fixed duration. These methods ignore the time dependencies between short video clips and the mood changes in short video clips. Therefore, combined with the related concepts of film and television narrative structure in cinematic language, this paper defines a prediction method for dimensional sentiment analysis of the movie and TV drama based on variable sequence length inputs. First, this paper defines a method for partitioning variable-length audiovisual sequences that set subunits of dimensional emotion prediction as variable sequence-length inputs. Then, a method for extracting and combining audio and visual features of each variable-length audiovisual sequence is proposed. Finally, a prediction network for dimensional emotion is designed based on variable sequence length inputs. This paper focuses on dimensional sentiment prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method achieves comparable performance to other methods while increasing the prediction speed, with the Mean Square Error (MSE) reduced from 0.13 to 0.11 for arousal and from 0.19 to 0.13 for valence.
{"title":"A Prediction Method for Dimensional Sentiment Analysis of the Movie and TV Drama based on Variable-length Sequence Input","authors":"Chunxiao Wang, Jingiing Zhang, Lihong Gan, Wei Jiang","doi":"10.1109/CoST57098.2022.00010","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00010","url":null,"abstract":"Time continuous emotion prediction problem has always been one of the difficulties in affective video content analysis. The current research mainly designs a temporally continuous long video emotion prediction method by dividing the long video into short video segments of fixed duration. These methods ignore the time dependencies between short video clips and the mood changes in short video clips. Therefore, combined with the related concepts of film and television narrative structure in cinematic language, this paper defines a prediction method for dimensional sentiment analysis of the movie and TV drama based on variable sequence length inputs. First, this paper defines a method for partitioning variable-length audiovisual sequences that set subunits of dimensional emotion prediction as variable sequence-length inputs. Then, a method for extracting and combining audio and visual features of each variable-length audiovisual sequence is proposed. Finally, a prediction network for dimensional emotion is designed based on variable sequence length inputs. This paper focuses on dimensional sentiment prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method achieves comparable performance to other methods while increasing the prediction speed, with the Mean Square Error (MSE) reduced from 0.13 to 0.11 for arousal and from 0.19 to 0.13 for valence.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116708025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}