2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

英文中文

Memomusic Version 2.0: Extending Personalized Music Recommendation with Automatic Music Generation memmusic 2.0版:扩展个性化音乐推荐与自动音乐生成

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859356

Luntian Mou, Yiyuan Zhao, Quan Hao, Yunhan Tian, Juehui Li, Jueying Li, Yiqi Sun, Feng Gao, Baocai Yin

Music emotion experience is a rather subjective and personalized issue. Therefore, we previously developed a personalized music recommendation system called MemoMusic to navigate listeners to more positive emotional states based not only on music emotion but also on possible memories aroused by music. In this paper, we propose to extend MemoMusic with automatic music generation based on an LSTM network, which can learn the characteristic of a tiny music clip with particular Valence and Arousal values and predict a new music sequence with similar music style. We call this enhanced system MemoMusic Verison 2.0. For experiment, a new dataset of 177 music in MIDI format was collected and labelled using the Valence-Arousal model from three categories of Classical, Popular, and Yanni music. Experimental results further demonstrate that memory is an influencing factor in determining perceived music emotion, and MemoMusic Version 2.0 can moderately navigate listeners to better emotional states.

音乐情感体验是一个较为主观和个性化的问题。因此，我们之前开发了一个名为MemoMusic的个性化音乐推荐系统，不仅根据音乐情感，还根据音乐可能唤起的记忆，将听众引导到更积极的情绪状态。在本文中，我们提出了基于LSTM网络的MemoMusic扩展为自动音乐生成，该网络可以学习具有特定Valence和Arousal值的微小音乐片段的特征，并预测具有相似音乐风格的新音乐序列。我们把这个增强的系统称为memomusic2.0版本。为了进行实验，我们收集了一个包含177首MIDI格式音乐的新数据集，并使用Valence-Arousal模型从古典音乐、流行音乐和雅尼音乐三大类中进行了标记。实验结果进一步表明，记忆是决定音乐情绪感知的影响因素，MemoMusic 2.0版本可以适度引导听者进入更好的情绪状态。

引用次数: 1

MOFN: Multi-Offset-Flow-Based Network for Video Restoration and Enhancement 基于多偏移流的视频恢复与增强网络

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859519

Yiru Chen, Yumei Wang, Yu Liu

Video restoration and enhancement tasks, including video super-resolution(VSR), are designed to convert low-quality videos into high-quality videos to improve the audience’s visual experience. In recent years, many deep learning methods using optical flow estimation or deformable convolution have been applied to video super-resolution. However, we find that motion estimation based on a single optical flow is difficult to capture enough inter-frame information, and the method using deformable convolution lacks clear motion constraints, which affects its ability to process fast motion. Therefore, we propose a multi-offset-flow-based network (MOFN) to make more effective use of inter-frame information by using optical flow with offset diversity. We proposed an alignment and compensation module that can estimate the optical flow with multiple offsets for neighbouring frames and perform frame alignment. The aligned video frames will be fed into the fusion module, and high-quality video frames will be obtained after fusion and reconstruction. Extensive results show that our proposed model has a good ability to process motion. On several benchmark datasets, our method has achieved favorable performance compared with the most advanced methods.

视频恢复和增强任务，包括视频超分辨率(VSR)，旨在将低质量的视频转换为高质量的视频，以改善观众的视觉体验。近年来，许多基于光流估计或可变形卷积的深度学习方法被应用于视频超分辨率。然而，我们发现基于单个光流的运动估计难以捕获足够的帧间信息，并且使用可变形卷积的方法缺乏明确的运动约束，这影响了其处理快速运动的能力。因此，我们提出了一种基于多偏移流的网络(MOFN)，通过使用具有偏移分集的光流来更有效地利用帧间信息。我们提出了一种对准和补偿模块，可以估计具有多个相邻帧偏移的光流并进行帧对准。将对齐后的视频帧送入融合模块，融合重构后得到高质量的视频帧。广泛的实验结果表明，该模型具有良好的运动处理能力。在几个基准数据集上，与最先进的方法相比，我们的方法取得了良好的性能。

{"title":"MOFN: Multi-Offset-Flow-Based Network for Video Restoration and Enhancement","authors":"Yiru Chen, Yumei Wang, Yu Liu","doi":"10.1109/ICMEW56448.2022.9859519","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859519","url":null,"abstract":"Video restoration and enhancement tasks, including video super-resolution(VSR), are designed to convert low-quality videos into high-quality videos to improve the audience’s visual experience. In recent years, many deep learning methods using optical flow estimation or deformable convolution have been applied to video super-resolution. However, we find that motion estimation based on a single optical flow is difficult to capture enough inter-frame information, and the method using deformable convolution lacks clear motion constraints, which affects its ability to process fast motion. Therefore, we propose a multi-offset-flow-based network (MOFN) to make more effective use of inter-frame information by using optical flow with offset diversity. We proposed an alignment and compensation module that can estimate the optical flow with multiple offsets for neighbouring frames and perform frame alignment. The aligned video frames will be fed into the fusion module, and high-quality video frames will be obtained after fusion and reconstruction. Extensive results show that our proposed model has a good ability to process motion. On several benchmark datasets, our method has achieved favorable performance compared with the most advanced methods.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131738283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pyramid-Context Guided Feature Fusion for RGB-D Semantic Segmentation 金字塔-上下文引导的RGB-D语义分割特征融合

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859353

Haoming Liu, Li Guo, Zhongwen Zhou, Hanyuan Zhang

Incorporating depth information into RGB images has proven its effectiveness in semantic segmentation. The multi-modal feature fusion, which integrates depth and RGB features, is a crucial component determining segmentation accuracy. Most existing multi-modal feature fusion schemes enhance multi-modal features via channel-wise attention modules which leverage global context information. In this work, we propose a novel pyramid-context guided fusion (PCGF) module to fully exploit the complementary information from the depth and RGB features. The proposed PCGF utilizes both local and global contexts inside the attention module to provide effective guidance for fusing cross-modal features of inconsistent semantics. Moreover, we introduce a lightweight yet practical multi-level general fusion module to combine the features at multiple levels of abstraction to enable high-resolution prediction. Utilizing the proposed feature fusion modules, our Pyramid-Context Guided Network (PCGNet) can learn discriminative features by taking full advantage of multi-modal and multi-level information. Our comprehensive experiments demonstrate that the proposed PCGNet achieves state-of-the-art performance on two benchmark datasets NYUDv2 and SUN-RGBD.

在RGB图像中加入深度信息已被证明是一种有效的语义分割方法。融合深度和RGB特征的多模态特征融合是决定分割精度的重要组成部分。大多数现有的多模态特征融合方案通过利用全局上下文信息的通道关注模块来增强多模态特征。在这项工作中，我们提出了一种新的金字塔-上下文引导融合(PCGF)模块，以充分利用深度和RGB特征的互补信息。本文提出的PCGF在注意模块中同时利用局部和全局上下文，为语义不一致的跨模态特征的融合提供有效的指导。此外，我们还引入了一个轻量级但实用的多级通用融合模块，将多个抽象层次的特征结合起来，实现高分辨率预测。利用所提出的特征融合模块，我们的金字塔-上下文引导网络(PCGNet)可以充分利用多模态和多层次的信息来学习判别特征。我们的综合实验表明，所提出的PCGNet在NYUDv2和SUN-RGBD两个基准数据集上实现了最先进的性能。

{"title":"Pyramid-Context Guided Feature Fusion for RGB-D Semantic Segmentation","authors":"Haoming Liu, Li Guo, Zhongwen Zhou, Hanyuan Zhang","doi":"10.1109/ICMEW56448.2022.9859353","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859353","url":null,"abstract":"Incorporating depth information into RGB images has proven its effectiveness in semantic segmentation. The multi-modal feature fusion, which integrates depth and RGB features, is a crucial component determining segmentation accuracy. Most existing multi-modal feature fusion schemes enhance multi-modal features via channel-wise attention modules which leverage global context information. In this work, we propose a novel pyramid-context guided fusion (PCGF) module to fully exploit the complementary information from the depth and RGB features. The proposed PCGF utilizes both local and global contexts inside the attention module to provide effective guidance for fusing cross-modal features of inconsistent semantics. Moreover, we introduce a lightweight yet practical multi-level general fusion module to combine the features at multiple levels of abstraction to enable high-resolution prediction. Utilizing the proposed feature fusion modules, our Pyramid-Context Guided Network (PCGNet) can learn discriminative features by taking full advantage of multi-modal and multi-level information. Our comprehensive experiments demonstrate that the proposed PCGNet achieves state-of-the-art performance on two benchmark datasets NYUDv2 and SUN-RGBD.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132627692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Integer Network for Cross Platform Graph Data Lossless Compression 跨平台图形数据无损压缩的整数网络

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859525

Ge Zhang, Huanyu He, Haiyang Wang, Weiyao Lin

It has been witnessed that the learned data compression techniques has outperformed conventional ones. However, the non-deterministic floating-point calculation makes the probability prediction inconsistent between sender and receiver, disabling practical applications. We propose to use the integer network to relieve this problem and focus on graph data lossless compression. Firstly, we propose an adaptive fixed-point format, AdaFixedPoint, which can convert a floating-point model, which has graph convolution layers to a fixed-point one with minimal precision loss and enable deterministic graph data lossless compression. Secondly, we propose QbiasFree Compensation and Bin Regularization to quantize the network with fewer bits, relieving the computation cost. Experiments show that our proposed integer network can achieve successful cross-platform graph data compression. And compared with the commonly used 8 bits, our method remarkably decreases the quantized average bit to 5 bits, without a performance drop.

实践证明，所学习的数据压缩技术优于传统的数据压缩技术。然而，不确定性的浮点计算使得发送方和接收方之间的概率预测不一致，不利于实际应用。我们建议使用整数网络来解决这个问题，并着重于图数据的无损压缩。首先，我们提出了一种自适应的定点格式AdaFixedPoint，它可以将具有图卷积层的浮点模型以最小的精度损失转换为定点模型，并实现确定性图数据的无损压缩。其次，我们提出了QbiasFree补偿和Bin正则化，用更少的比特来量化网络，从而降低了计算成本。实验表明，本文提出的整数网络可以成功实现跨平台的图数据压缩。与常用的8位相比，我们的方法将量化平均比特显著降低到5位，而性能没有下降。

引用次数: 0

Regularized DTW in Offline Music Score-Following for Sight-Singing Based on Sol-fa Name Recognition 基于solfa名字识别的脱机视唱乐谱跟踪中的正则化DTW

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859398

Rongfeng Li, Kuoxi Yu

The automatic scoring of singing evaluation is a hot topic in recent years. Improving the score following effect is the first step to improve the accuracy of evaluation. Most of the commonly used methods are based on DTW, but for audios with low singing quality and inaccurate pitch, DTW often predicts the onset incorrectly. In order to solve the above problems, this paper focus on the offline following, mainly improves from two aspects: 1. Sol-fa name recognition is done before pitch tracking as preprocess. We cannot guarantee that the pitch of the singer is correct, but we can assume that the singer pronounces the sol-fa name correctly, so we use sol-fa name recognition as preprocessing; 2. Regularized DTW is proposed based on the basis of sol-fa name recognition. The results show that for general audio, under the condition of a tolerance of 20ms, compared with about 86% accuracy of ordinary DTW algorithm, our algorithm has improved to about 92%, while the average error of predicted notes is reduced by about 23ms. For audio with low signal-to-noise ratio and unstable voice frequency, the alignment effect is improved by about 20% compared with ordinary DTW.

歌唱评价自动评分是近年来的研究热点。提高分数跟踪效果是提高评价准确性的第一步。常用的方法大多基于DTW，但对于演唱质量低、音高不准确的音频，DTW往往无法准确预测起跳。为了解决以上问题，本文重点对线下进行了如下改进，主要从两个方面进行了改进:1.线下;球场名称识别是在球场跟踪之前进行的预处理。我们不能保证演唱者的音高是正确的，但我们可以假设演唱者正确地读出了sol-fa的名字，所以我们使用sol-fa的名字识别作为预处理;2. 在sol-fa名称识别的基础上，提出了正则化DTW。结果表明，对于一般音频，在容差为20ms的情况下，与普通DTW算法86%的准确率相比，我们的算法提高到了92%左右，预测音符的平均误差降低了23ms左右。对于信噪比低、话音频率不稳定的音频，与普通DTW相比，对准效果提高了20%左右。

{"title":"Regularized DTW in Offline Music Score-Following for Sight-Singing Based on Sol-fa Name Recognition","authors":"Rongfeng Li, Kuoxi Yu","doi":"10.1109/ICMEW56448.2022.9859398","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859398","url":null,"abstract":"The automatic scoring of singing evaluation is a hot topic in recent years. Improving the score following effect is the first step to improve the accuracy of evaluation. Most of the commonly used methods are based on DTW, but for audios with low singing quality and inaccurate pitch, DTW often predicts the onset incorrectly. In order to solve the above problems, this paper focus on the offline following, mainly improves from two aspects: 1. Sol-fa name recognition is done before pitch tracking as preprocess. We cannot guarantee that the pitch of the singer is correct, but we can assume that the singer pronounces the sol-fa name correctly, so we use sol-fa name recognition as preprocessing; 2. Regularized DTW is proposed based on the basis of sol-fa name recognition. The results show that for general audio, under the condition of a tolerance of 20ms, compared with about 86% accuracy of ordinary DTW algorithm, our algorithm has improved to about 92%, while the average error of predicted notes is reduced by about 23ms. For audio with low signal-to-noise ratio and unstable voice frequency, the alignment effect is improved by about 20% compared with ordinary DTW.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123729766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DAMUS: A Collaborative System for Choreography and Music Composition DAMUS:编舞与作曲的协同系统

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859441

Tiange Zhou, Borou Yu, Jiajian Min, Zeyu Wang

Throughout the history of dance and music collaborations, composers and choreographers have always engaged in separate workflows. Usually, composers and choreographers complete the music and choreograph the moves separately, where the lack of mutual understanding of their artistic approaches results in a long production time. There is a strong need in the performance industry to reduce the time for establishing a collaborative foundation, allowing for more productive creations. We propose DAMUS, a work-in-progress collaborative system for choreography and music composition, in order to reduce production time and boost productivity.DAMUS is composed of a dance module DA and a music module MUS. DA translates dance motion into MoCap data, Labanotation, and number notation, and sets rules of variations for choreography. MUS produces musical materials that fit the tempo and rhythm of specific dance genres or moves. We applied our system prototype to case studies in three different genres. In the future, we plan to pursue more genres and further develop DAMUS with evolutionary computation and style transfer.

纵观舞蹈和音乐合作的历史，作曲家和编舞总是从事不同的工作流程。通常情况下，作曲家和编舞是分开完成音乐和编排动作的，他们的艺术方法缺乏相互理解，导致制作时间很长。演出行业迫切需要减少建立协作基础的时间，以便进行更有成效的创作。为了减少制作时间和提高生产力，我们提出了DAMUS，一个正在进行的编舞和音乐创作协作系统。DAMUS由舞蹈模块DA和音乐模块MUS组成。DA将舞蹈动作转换为动作捕捉数据、Labanotation和数字符号，并为编舞设置变化规则。MUS制作适合特定舞蹈类型或动作的节奏和节奏的音乐材料。我们将系统原型应用于三种不同类型的案例研究。在未来，我们计划追求更多的类型，并进一步发展进化计算和风格转移的DAMUS。

{"title":"DAMUS: A Collaborative System for Choreography and Music Composition","authors":"Tiange Zhou, Borou Yu, Jiajian Min, Zeyu Wang","doi":"10.1109/ICMEW56448.2022.9859441","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859441","url":null,"abstract":"Throughout the history of dance and music collaborations, composers and choreographers have always engaged in separate workflows. Usually, composers and choreographers complete the music and choreograph the moves separately, where the lack of mutual understanding of their artistic approaches results in a long production time. There is a strong need in the performance industry to reduce the time for establishing a collaborative foundation, allowing for more productive creations. We propose DAMUS, a work-in-progress collaborative system for choreography and music composition, in order to reduce production time and boost productivity.DAMUS is composed of a dance module DA and a music module MUS. DA translates dance motion into MoCap data, Labanotation, and number notation, and sets rules of variations for choreography. MUS produces musical materials that fit the tempo and rhythm of specific dance genres or moves. We applied our system prototype to case studies in three different genres. In the future, we plan to pursue more genres and further develop DAMUS with evolutionary computation and style transfer.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127675449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Emotion Recognition Based on Representation Dissimilarity Matrix 基于表示不相似矩阵的情感识别

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859269

Hongjian Bo, Cong Xu, Boying Wu, Lin Ma, Haifeng Li

Emotion recognition based on electroencephalogram (EEG) has been widely concerned because it could reflect intrinsic emotional information. Although a large number of achievements have been made, great challenges still exist. For example, strict identification conditions make it difficult to apply in real life. Therefore, an experimental method of emotion induction based on daily sounds is proposed in this article, which is closer to the everyday work environment. Then, a feature optimization method based on the representation dissimilarity matrix is proposed. Finally, the feature evaluation criteria are established and the emotion-related features are found. In this article, EEG data of 16 volunteers in different emotional sounds were collected. Three types of EEG feature: high-order crossing, power spectral density and difference asymmetry were extracted. After feature optimization, and model construction, the recognition rate of high and low valence was up to 69%. This study explores the dynamic response of people listening to sound and shows that the environmental sound could effectively induce and recognize emotional status, which could better help AI understand people’s preferences and needs.

基于脑电图(EEG)的情绪识别因能反映内在情绪信息而受到广泛关注。虽然取得了许多成就，但仍然存在很大的挑战。例如，严格的识别条件使其难以应用于现实生活。因此，本文提出了一种更贴近日常工作环境的基于日常声音的情绪诱导实验方法。然后，提出了一种基于表示不相似矩阵的特征优化方法。最后，建立特征评价准则，找到与情感相关的特征。本文收集了16名志愿者在不同情绪声音下的脑电图数据。提取了三种EEG特征:高阶交叉、功率谱密度和差分不对称性。经过特征优化和模型构建，高、低值的识别率达到69%。本研究探讨了人们听声音时的动态反应，表明环境声音可以有效地诱导和识别情绪状态，这可以更好地帮助AI了解人们的偏好和需求。

{"title":"Emotion Recognition Based on Representation Dissimilarity Matrix","authors":"Hongjian Bo, Cong Xu, Boying Wu, Lin Ma, Haifeng Li","doi":"10.1109/ICMEW56448.2022.9859269","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859269","url":null,"abstract":"Emotion recognition based on electroencephalogram (EEG) has been widely concerned because it could reflect intrinsic emotional information. Although a large number of achievements have been made, great challenges still exist. For example, strict identification conditions make it difficult to apply in real life. Therefore, an experimental method of emotion induction based on daily sounds is proposed in this article, which is closer to the everyday work environment. Then, a feature optimization method based on the representation dissimilarity matrix is proposed. Finally, the feature evaluation criteria are established and the emotion-related features are found. In this article, EEG data of 16 volunteers in different emotional sounds were collected. Three types of EEG feature: high-order crossing, power spectral density and difference asymmetry were extracted. After feature optimization, and model construction, the recognition rate of high and low valence was up to 69%. This study explores the dynamic response of people listening to sound and shows that the environmental sound could effectively induce and recognize emotional status, which could better help AI understand people’s preferences and needs.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127387583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Demusa: Demo for Multimodal Sentiment Analysis Demusa:多模态情感分析演示

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859289

Soyeon Hong, Jeonghoon Kim, Donghoon Lee, Hyunsouk Cho

Recently, a lot of Multimodal Sentiment Analysis (MSA) models appeared to understanding opinions in multimedia. To accelerate MSA researches, CMU-MOSI and CMU-MOSEI were released as the open-datasets. However, it is hard to observe the input data elements in detail and analyze the prediction model results with each video clip for qualitative evaluation. For these reasons, this paper suggests DeMuSA, demo for multimodal sentiment analysis to explore raw data instance and compare prediction models by utterance-level.

近年来，出现了许多多模态情感分析(MSA)模型来理解多媒体中的观点。为了加速MSA研究，CMU-MOSI和CMU-MOSEI作为开放数据集发布。然而，很难对输入的数据元素进行详细的观察，并对每个视频片段的预测模型结果进行定性评价。基于这些原因，本文建议使用DeMuSA, demo进行多模态情感分析，探索原始数据实例，并从话语层面比较预测模型。

引用次数: 0

Diversity-Based Media Search 基于多样性的媒体搜索

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859474

P. Aarabi

In this paper, we outline a method for searching a set of information based on both the individual diversity of the constituent elements of the set as well as its overall ensemble diversity. Using the example of searching user accounts on Instagram, we are able to perform searches based on the representative diversity (across race, age, gender, body type, skin tone, and disability) of the posts as well as the overall diversity of the search results.

在本文中，我们概述了一种基于集合的组成元素的个体多样性和集合的整体多样性来搜索集合信息的方法。以在Instagram上搜索用户帐户为例，我们能够根据帖子的代表性多样性(跨种族、年龄、性别、体型、肤色和残疾)以及搜索结果的整体多样性执行搜索。

引用次数: 0

Video Object Segmentation with Online Mask Refinement 视频对象分割与在线掩码细化

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859386

Tomoya Sawada, Teng-Yok Lee, Masahiro Mizuno

This paper proposes a simple and effective video object instance segmentation method without fine-tuning named Mask Refinement Module(MRM). Many papers settle a labeling problem aiming to separate foreground objects, but most of them require training their networks again on target data. In a real scenario, it is not easy to collect dataset on the target environment and to label them as well due to security policies or a cost problem, especially for industry. We solve the problem by reshaping object masks with a video based online-learning method that enables us to adapt various changes frame by frame. In extensive experiments, results show that our approach is highly effective compared to modern methods by up to 13.9% improving of F-measure on large video surveillance dataset such as CDNet (118K images).

本文提出了一种简单有效的无需微调的视频对象实例分割方法——掩码细化模块(MRM)。许多论文解决了一个标记问题，旨在分离前景目标，但大多数都需要在目标数据上再次训练他们的网络。在实际场景中，由于安全策略或成本问题，在目标环境中收集数据集并对其进行标记并不容易，特别是对于工业而言。我们通过一种基于视频的在线学习方法来解决这个问题，这种方法使我们能够一帧一帧地适应各种变化。在大量的实验中，结果表明，与现代方法相比，我们的方法非常有效，在CDNet (118K图像)等大型视频监控数据集上，F-measure提高了13.9%。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀