2022 International Conference on Culture-Oriented Science and Technology (CoST)最新文献

英文中文

Automatic Image Generation of Peking Opera Face using StyleGAN2 基于StyleGAN2的京剧脸谱图像自动生成

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/CoST57098.2022.00030

Xiaoyu Xin, Yinghua Shen, Rui Xiong, Xiahan Lin, Ming Yan, Wei Jiang

Image generation technology, which is often used in various applications of intelligent image generation, can learn the feature distribution of real images and sample from the distribution to obtain the generated images with high fidelity. This paper focuses on the feature extraction and intelligent generation techniques of Peking opera face with Chinese cultural characteristics. Based on the creation of a Peking opera face dataset, this paper compares the impact of different variants of a Style-based generator architecture for Generative Adversarial Networks (StyleGAN2) and different sizes of datasets on the quality of face generation. The experimental results verify that the synthetic images generated by StyleGAN2 with the addition of the Adaptive Discriminator Augmentation (ADA) module are visually better and have good local randomness when the dataset is small and unbalanced in distribution.

图像生成技术经常用于智能图像生成的各种应用中，它可以学习真实图像的特征分布，并从分布中进行采样，从而获得高保真度的生成图像。本文主要研究具有中国文化特色的京剧脸谱的特征提取和智能生成技术。基于京剧人脸数据集的创建，本文比较了生成对抗网络(StyleGAN2)基于风格的生成器架构的不同变体和不同数据集大小对人脸生成质量的影响。实验结果表明，当数据集较小且分布不平衡时，StyleGAN2添加自适应判别器增强(Adaptive Discriminator Augmentation, ADA)模块生成的合成图像视觉效果更好，具有良好的局部随机性。

引用次数: 0

A Vision Enhancement Network for Image Quality Assessment 用于图像质量评估的视觉增强网络

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/cost57098.2022.00032

Xinyu Jiang, Jiangbo Xu, Ruoyu Zou

With the development and update of electronic equipment, image quality assessment has become one of the hot topics. Recently, digital image processing and convolutional neural networks (CNN) have made significant progress. However, the models based on human vision characteristics and neural feedback have poor performance in previous studies. Inspired by this, we propose a CNN-based network, vision enhancement network (VE-Net). It can filter images adaptively according to the key regions. Key regions are extracted with the incentive support method from deep information learned by CNN. The adaptive filter uses Laplacian filter and Gaussian filter. Laplacian filter adopts a linear lifting algorithm, aiming to attach the image texture to the original image. Squared earth mover’s distance (EMD) loss is selected to predict the image aesthetic score distribution. VE-Net is evaluated on AVA dataset for the regression task and the classification task. Experiments show the superiority of VE-Net.

随着电子设备的发展和更新，图像质量评估已成为一个热门话题。近年来，数字图像处理和卷积神经网络(CNN)取得了重大进展。然而，基于人类视觉特征和神经反馈的模型在以往的研究中表现不佳。受此启发，我们提出了一种基于cnn的网络，视觉增强网络(VE-Net)。它可以根据关键区域对图像进行自适应滤波。用激励支持法从CNN学习到的深度信息中提取关键区域。自适应滤波器采用拉普拉斯滤波器和高斯滤波器。拉普拉斯滤波采用线性提升算法，目的是将图像纹理附加到原始图像上。选择方推土机的距离损失(EMD)来预测图像的美学分数分布。在AVA数据集上评估VE-Net的回归任务和分类任务。实验证明了VE-Net的优越性。

引用次数: 0

Real-time Human-Music Emotional Interaction Based on Multimodal Analysis 基于多模态分析的实时人-音乐情感交互

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/cost57098.2022.00020

Tianyue Jiang, Sanhong Deng, Peng Wu, Haibi Jiang

Music, as an important part of the culture, occupies a significant position and can be easily accessed. The research on the sentiment represented by music and its effect on the listener’s emotion is increasing gradually, but the existing research is often subjective and neglects the real-time expression of emotion. In this article, two labeled datasets are established. The deep learning method is used to classify music sentiment while the decision-level fusion method is used for real-time listener multimodal sentiment. We combine the sentiment analysis with a traditional online music playback system and propose innovatively a human-music emotional interaction system, using multimodal sentiment analysis based on the deep learning method. By means of individual observation and questionnaire survey, the interaction between human-music sentiments is proved to have a positive influence on listeners’ negative emotions.

音乐作为文化的一个重要组成部分，占有重要的地位，也很容易接触到。关于音乐所代表的情感及其对听者情感影响的研究逐渐增多，但现有的研究往往是主观的，忽视了情感的实时表达。在本文中，建立了两个标记数据集。采用深度学习方法对音乐情感进行分类，采用决策级融合方法对听众实时多模态情感进行分类。我们将情感分析与传统的在线音乐播放系统相结合，采用基于深度学习的多模态情感分析方法，创新性地提出了一个人-音乐情感交互系统。通过个体观察和问卷调查，证明了人与音乐情感的互动对听者的消极情绪有积极的影响。

引用次数: 0

Research on the practical path of Chinese movie and television communication from the perspective of “The Belt and Road” “一带一路”视角下的中国影视传播实践路径研究

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/cost57098.2022.00047

Jinchu Zhou, Ying Wang, Bo Li

Under the background of The Belt and Road Chinese movies face a dual context of coexistence of opportunities and difficulties in international communication. To explore the feasible method of Chinese movie’s external diffusion, we take 36 countries along the “Belt and Road” from 2017.12 to 2020.12 as the research object and use the Latent Dirichlet allocation topic model(LDA) to summarize six categories of topics. According to the box office distribution of each topic, all countries are divided into six categories, and we discuss the topic preferences of each country separately. To promote movie diffusion, for regions with significant cultural differences, the movie should be carried out according to their preferred topics; for regions with better development of the movie market but less cooperation, we should output high-quality local movies to carry out cooperation shooting at the same time; for regions with deep cultural exchanges and a weak economy, we should output movies with diverse culture and preaching national spirit. In addition, it is necessary to consider multiple factors such as the political and religious background to avoid breaking group taboos and formulate export plans according to local conditions to achieve multi-ethnic and multicultural integration further.

在“一带一路”的大背景下，中国电影在国际传播中面临着机遇与困难并存的双重语境。为了探索中国电影对外扩散的可行方法，我们以2017.12 - 20120.12年“一带一路”沿线36个国家为研究对象，运用潜狄利克雷分配话题模型(Latent Dirichlet allocation topic model, LDA)对6类话题进行了归纳。根据每个主题的票房分布，将所有国家分为六类，我们分别讨论每个国家的主题偏好。为了促进电影的传播，对于文化差异较大的地区，电影应该根据他们喜欢的主题进行;对于电影市场发展较好但合作较少的地区，在输出优质本土电影的同时开展合作拍摄;对于文化交流较深、经济较弱的地区，我们应该输出文化多元、宣扬民族精神的电影。此外，还需要考虑政治、宗教背景等多重因素，避免打破群体禁忌，因地制宜地制定出口计划，进一步实现多民族、多元文化的融合。

{"title":"Research on the practical path of Chinese movie and television communication from the perspective of “The Belt and Road”","authors":"Jinchu Zhou, Ying Wang, Bo Li","doi":"10.1109/cost57098.2022.00047","DOIUrl":"https://doi.org/10.1109/cost57098.2022.00047","url":null,"abstract":"Under the background of The Belt and Road Chinese movies face a dual context of coexistence of opportunities and difficulties in international communication. To explore the feasible method of Chinese movie’s external diffusion, we take 36 countries along the “Belt and Road” from 2017.12 to 2020.12 as the research object and use the Latent Dirichlet allocation topic model(LDA) to summarize six categories of topics. According to the box office distribution of each topic, all countries are divided into six categories, and we discuss the topic preferences of each country separately. To promote movie diffusion, for regions with significant cultural differences, the movie should be carried out according to their preferred topics; for regions with better development of the movie market but less cooperation, we should output high-quality local movies to carry out cooperation shooting at the same time; for regions with deep cultural exchanges and a weak economy, we should output movies with diverse culture and preaching national spirit. In addition, it is necessary to consider multiple factors such as the political and religious background to avoid breaking group taboos and formulate export plans according to local conditions to achieve multi-ethnic and multicultural integration further.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122937014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Evaluation of Multi-Access Based on ATSSS Rules 基于ATSSS规则的多址性能评价

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/CoST57098.2022.00086

Xinran Ba, Libiao Jin, Zhou Li, Chang Liu, Sidong Li

5th generation mobile communication technology (5G) uses its extended capability, access traffic steering, switching, and splitting (ATSSS) to enable multipath transmission of data, which is currently being standardized. Attempts are being made in both academia and industry to study multi-access technologies based on the ATSSS function. This study follows the tenet of minimal modification of the existing 5G protocol data unit session management and builds a multi-access network architecture to achieve multi-path transmission of data. The results reveal that multi-access technology can significantly improve the average throughput of users and network load balancing.

第五代移动通信技术(5G)利用其扩展能力、接入流量转向、交换和分割(ATSSS)实现数据的多路径传输，目前正在标准化中。学术界和工业界都在尝试研究基于ATSSS功能的多址技术。本研究遵循对现有5G协议数据单元会话管理修改最小的原则，构建多接入网架构，实现数据的多路径传输。结果表明，多址技术可以显著提高用户的平均吞吐量和网络负载均衡。

引用次数: 0

A real-time localization algorithm based on feature point matching 基于特征点匹配的实时定位算法

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/CoST57098.2022.00056

Shuo Lei, Siyi Tian, Qiming Huang, Anyi Huang

When watching video, people can roughly estimate the direction and displacement of the camera movement by the change of the two frames before and after the video. We have studied this phenomenon and quantified it using computer. In this paper, we compare the current main image feature point extraction techniques with search matching techniques and propose an algorithm that relies on camera images to calculate instantaneous displacement. The algorithm is well suited to guided robots that are generally equipped with cameras. Based on the feature point extraction and matching technology, the algorithm realizes the function of calculating the camera displacement between two samples through the homography matrix transformation and camera calibration. We have conducted several comprehensive experiments on the algorithm in multiple environments and analyzed the proposed algorithm in this paper based on the experimental results.

人们在观看视频时，可以通过视频前后两帧的变化大致估计出摄像机运动的方向和位移。我们对这一现象进行了研究，并用计算机对其进行了量化。在本文中，我们比较了当前主要图像特征点提取技术和搜索匹配技术，并提出了一种基于相机图像计算瞬时位移的算法。该算法非常适合通常配备摄像头的引导机器人。该算法基于特征点提取与匹配技术，通过单应性矩阵变换和摄像机标定，实现了计算两个样本间摄像机位移的功能。我们对该算法在多种环境下进行了多次综合实验，并根据实验结果对本文提出的算法进行了分析。

引用次数: 0

Evaluation of Auditory Saliency Model Based on Saliency Map 基于显著性图的听觉显著性模型评价

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/CoST57098.2022.00061

Xiansong Xiong, Zhijun Zhao, Lingyun Xie

For the bottom-up auditory attention process, many auditory attention models have been proposed, including the earliest four auditory saliency models developed from visual saliency models, namely Kayser model, Kalinli model, Duangudom model and Kaya model. In order to compare the correlation between the output results of the four models and subjective perception, firstly the four models were evaluated by carrying out a subjective saliency evaluation experiment in this paper. In the subjective evaluation experiment, 20 kinds of sound scene materials were scored with relative saliency and absolute saliency, and two rankings were obtained. Secondly in the saliency model, the saliency scores were calculated for the same 20 kinds of sounds, and the saliency of the sounds were scored by extracting the mean, peak, variance and dynamic characteristics of the saliency score of each sound, and then correlations were calculated between model saliency scores and two subjective scores. The conclusion was that Kalinli model had the best effect among the four models and had the highest correlation with subjective perception; among the four features of the saliency score, the variance had the highest correlation with subjective perception. The main reason for the better results of Kalinli model was that the method of extracting auditory spectrograms and features was more consistent with the auditory characteristics of human ear and the extracted features were more comprehensive. By analyzing the structure and perceptual features of the models with high correlation between model output and subjective perception, we can improve the models in the future based on the conclusions drawn, so as to enhance their performance and make them more consistent with the auditory characteristics of the human ear.

对于自下而上的听觉注意过程，人们提出了许多听觉注意模型，其中最早的四个听觉显著性模型是在视觉显著性模型的基础上发展起来的，即Kayser模型、Kalinli模型、Duangudom模型和Kaya模型。为了比较四种模型的输出结果与主观感知的相关性，本文首先通过主观显著性评价实验对四种模型进行评价。在主观评价实验中，对20种声音场景素材进行相对显著性和绝对显著性评分，得出两个等级。其次，在显著性模型中，计算相同20种声音的显著性得分，并通过提取每个声音显著性得分的均值、峰值、方差和动态特征对声音的显著性得分进行评分，然后计算模型显著性得分与两个主观得分之间的相关性。结果表明，Kalinli模型在四种模型中效果最好，与主观知觉的相关性最高;在显著性得分的四个特征中，方差与主观知觉的相关性最高。Kalinli模型效果较好的主要原因是提取听觉谱图和特征的方法更符合人耳的听觉特征，提取的特征更全面。通过分析模型输出与主观感知高度相关的模型的结构和感知特征，我们可以根据得出的结论在未来对模型进行改进，从而提高模型的性能，使其更符合人耳的听觉特征。

{"title":"Evaluation of Auditory Saliency Model Based on Saliency Map","authors":"Xiansong Xiong, Zhijun Zhao, Lingyun Xie","doi":"10.1109/CoST57098.2022.00061","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00061","url":null,"abstract":"For the bottom-up auditory attention process, many auditory attention models have been proposed, including the earliest four auditory saliency models developed from visual saliency models, namely Kayser model, Kalinli model, Duangudom model and Kaya model. In order to compare the correlation between the output results of the four models and subjective perception, firstly the four models were evaluated by carrying out a subjective saliency evaluation experiment in this paper. In the subjective evaluation experiment, 20 kinds of sound scene materials were scored with relative saliency and absolute saliency, and two rankings were obtained. Secondly in the saliency model, the saliency scores were calculated for the same 20 kinds of sounds, and the saliency of the sounds were scored by extracting the mean, peak, variance and dynamic characteristics of the saliency score of each sound, and then correlations were calculated between model saliency scores and two subjective scores. The conclusion was that Kalinli model had the best effect among the four models and had the highest correlation with subjective perception; among the four features of the saliency score, the variance had the highest correlation with subjective perception. The main reason for the better results of Kalinli model was that the method of extracting auditory spectrograms and features was more consistent with the auditory characteristics of human ear and the extracted features were more comprehensive. By analyzing the structure and perceptual features of the models with high correlation between model output and subjective perception, we can improve the models in the future based on the conclusions drawn, so as to enhance their performance and make them more consistent with the auditory characteristics of the human ear.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133406105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development of VR Motion Sickness Test Platform Based on UE 基于UE的VR晕动病测试平台的开发

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/CoST57098.2022.00043

Yixin Tai, Yu Yang, Xiaotian Wang

Virtual Reality (VR) technology is considered to be an important technical support for Metaverse. However, the discomfort caused by VR, especially VR motion sickness, greatly affects the user experience. Therefore, it’s particularly important to study the comfort for VR. In this project, a motion sickness test platform based on Unreal Engine (UE) was developed to measure and improve the comfort for VR. In the platform, virtual three-dimensional scenes are created and the platform can adjust parameters like rotation angular velocity and axis, height of the virtual camera, quantities of black and white stripes. Rotation angular velocity and axis of the virtual camera is set to verify usability of the platform. It can be concluded that VR motion sickness to some extent is aggravated with angular velocity going up. It’s more intense when rotating around X and Y axes than when rotating around Z. And women are more likely to get VR motion sickness than men.

虚拟现实(VR)技术被认为是虚拟世界的重要技术支持。然而，VR带来的不适，尤其是VR晕动病，极大地影响了用户体验。因此，研究VR的舒适度就显得尤为重要。本课题开发了一个基于虚幻引擎(UE)的晕动病测试平台，以测量和提高VR的舒适度。在平台中创建虚拟的三维场景，平台可以调整旋转角速度和轴、虚拟摄像机高度、黑白条纹数量等参数。设置虚拟摄像机的旋转角速度和轴向，验证平台的可用性。可以看出，随着角速度的增大，VR晕动病在一定程度上加重。在X轴和Y轴上旋转比在z轴上旋转时更强烈，女性比男性更容易患VR晕动病。

引用次数: 2

Performance comparison of deep learning methods on hand bone segmentation and bone age assessment 深度学习方法在手部骨分割和骨龄评估中的性能比较

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/CoST57098.2022.00083

Yingying Lv, Jingtao Wang, Wenbo Wu, Yun Pan

Bone age is the biological age that reflects the growth and development of human body. Bone age assessment has been applied and plays an important role in clinical medicine, sports science and justice. Reasonable convolution neural network (CNN) models can greatly improve the accuracy and efficiency of bone age assessment. By comparing various hand bone segmentation models trained by classical convolutional neural networks, we found that with intersection over inion (IoU) and dice similarity coefficient (Dice) as evaluation indexes, the segmentation model trained by U-Net had the best performance. Its IoU reached 0.9746, and its Dice reached 0.9871. This is contrary to our inherent recognition that the U-Net++ model is superior to the U-Net model. Based on the images segmented by U-Net, we applied five kinds of common convolutional neural networks to bone age prediction, with mean absolute error (MAE) and error accuracy within two years as evaluation indexes. The results showed that the MAE of Xception was 7.635 and the accuracy of errors within two years reached 97.59%. In this paper, we provide an optimal scheme for bone age image segmentation and bone age assessment, and provide a theoretical basis for the design of bone age assessment system.

骨龄是反映人体生长发育的生物年龄。骨龄评估在临床医学、体育科学和司法等领域都有广泛的应用和作用。合理的卷积神经网络(CNN)模型可以大大提高骨龄评估的准确性和效率。通过对比经典卷积神经网络训练的各种手骨分割模型，我们发现以交叉数(intersection over inion, IoU)和骰子相似系数(dice, dice)作为评价指标，U-Net训练的手骨分割模型表现最好。IoU为0.9746,Dice为0.9871。这与我们固有的认识相反，即unet++模型优于U-Net模型。在U-Net分割图像的基础上，应用5种常用卷积神经网络进行骨龄预测，以平均绝对误差(MAE)和2年内的误差精度为评价指标。结果表明:异常的MAE为7.635,2年内误差的准确率达到97.59%。本文提出了一种骨年龄图像分割和骨年龄评估的优化方案，为骨年龄评估系统的设计提供了理论依据。

{"title":"Performance comparison of deep learning methods on hand bone segmentation and bone age assessment","authors":"Yingying Lv, Jingtao Wang, Wenbo Wu, Yun Pan","doi":"10.1109/CoST57098.2022.00083","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00083","url":null,"abstract":"Bone age is the biological age that reflects the growth and development of human body. Bone age assessment has been applied and plays an important role in clinical medicine, sports science and justice. Reasonable convolution neural network (CNN) models can greatly improve the accuracy and efficiency of bone age assessment. By comparing various hand bone segmentation models trained by classical convolutional neural networks, we found that with intersection over inion (IoU) and dice similarity coefficient (Dice) as evaluation indexes, the segmentation model trained by U-Net had the best performance. Its IoU reached 0.9746, and its Dice reached 0.9871. This is contrary to our inherent recognition that the U-Net++ model is superior to the U-Net model. Based on the images segmented by U-Net, we applied five kinds of common convolutional neural networks to bone age prediction, with mean absolute error (MAE) and error accuracy within two years as evaluation indexes. The results showed that the MAE of Xception was 7.635 and the accuracy of errors within two years reached 97.59%. In this paper, we provide an optimal scheme for bone age image segmentation and bone age assessment, and provide a theoretical basis for the design of bone age assessment system.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114734140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Prediction Method for Dimensional Sentiment Analysis of the Movie and TV Drama based on Variable-length Sequence Input 基于变长序列输入的影视剧多维情感分析预测方法

2022 International Conference on Culture-Oriented Science and Technology (CoST)

Pub Date : 2022-08-01 DOI: 10.1109/CoST57098.2022.00010

Chunxiao Wang, Jingiing Zhang, Lihong Gan, Wei Jiang

Time continuous emotion prediction problem has always been one of the difficulties in affective video content analysis. The current research mainly designs a temporally continuous long video emotion prediction method by dividing the long video into short video segments of fixed duration. These methods ignore the time dependencies between short video clips and the mood changes in short video clips. Therefore, combined with the related concepts of film and television narrative structure in cinematic language, this paper defines a prediction method for dimensional sentiment analysis of the movie and TV drama based on variable sequence length inputs. First, this paper defines a method for partitioning variable-length audiovisual sequences that set subunits of dimensional emotion prediction as variable sequence-length inputs. Then, a method for extracting and combining audio and visual features of each variable-length audiovisual sequence is proposed. Finally, a prediction network for dimensional emotion is designed based on variable sequence length inputs. This paper focuses on dimensional sentiment prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method achieves comparable performance to other methods while increasing the prediction speed, with the Mean Square Error (MSE) reduced from 0.13 to 0.11 for arousal and from 0.19 to 0.13 for valence.

时间连续情感预测问题一直是情感视频内容分析的难点之一。本研究主要通过将长视频分割成固定时长的短视频片段，设计一种时间连续的长视频情绪预测方法。这些方法忽略了短视频片段之间的时间依赖性和短视频片段中的情绪变化。因此，本文结合电影语言中影视叙事结构的相关概念，定义了一种基于变序列长度输入的影视剧维度情感分析预测方法。首先，本文定义了一种划分可变长度视听序列的方法，该方法将维度情感预测的子单元作为可变序列长度的输入。然后，提出了一种对每个变长音视频序列进行音视频特征提取和组合的方法。最后，设计了基于变序列长度输入的多维情感预测网络。本文重点研究了多维情感预测，并在扩展的COGNIMUSE数据集上对该方法进行了评价。该方法在提高预测速度的同时取得了与其他方法相当的性能，唤醒的均方误差(MSE)从0.13降至0.11，价态的均方误差从0.19降至0.13。

{"title":"A Prediction Method for Dimensional Sentiment Analysis of the Movie and TV Drama based on Variable-length Sequence Input","authors":"Chunxiao Wang, Jingiing Zhang, Lihong Gan, Wei Jiang","doi":"10.1109/CoST57098.2022.00010","DOIUrl":"https://doi.org/10.1109/CoST57098.2022.00010","url":null,"abstract":"Time continuous emotion prediction problem has always been one of the difficulties in affective video content analysis. The current research mainly designs a temporally continuous long video emotion prediction method by dividing the long video into short video segments of fixed duration. These methods ignore the time dependencies between short video clips and the mood changes in short video clips. Therefore, combined with the related concepts of film and television narrative structure in cinematic language, this paper defines a prediction method for dimensional sentiment analysis of the movie and TV drama based on variable sequence length inputs. First, this paper defines a method for partitioning variable-length audiovisual sequences that set subunits of dimensional emotion prediction as variable sequence-length inputs. Then, a method for extracting and combining audio and visual features of each variable-length audiovisual sequence is proposed. Finally, a prediction network for dimensional emotion is designed based on variable sequence length inputs. This paper focuses on dimensional sentiment prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method achieves comparable performance to other methods while increasing the prediction speed, with the Mean Square Error (MSE) reduced from 0.13 to 0.11 for arousal and from 0.19 to 0.13 for valence.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116708025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 International Conference on Culture-Oriented Science and Technology (CoST)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀