首页 > 最新文献

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

英文 中文
Transformer Based Multimodal Scene Recognition in Soccer Videos 基于变压器的足球视频多模态场景识别
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859304
Yaozong Gan, Ren Togo, Takahiro Ogawa, M. Haseyama
This paper presents a transformer-based multimodal soccer scene recognition method for both visual and audio modalities. Our approach directly uses the original video frames and audio spectrogram from the soccer video as the input of the transformer model, which can capture the spatial information of the action at a moment and the contextual temporal information between different actions in the soccer videos. We fuse both video frames and audio spectrogram information output from the transformer model in order to better identify scenes that occur in real soccer matches. The late fusion performs a weighted average of visual and audio estimation results to obtain complete information of a soccer scene. We evaluate the proposed method on SoccerNet-V2 dataset and confirm that our method achieves the best performance compared with the recent and state-of-the-art methods.
提出了一种基于变压器的多模态足球场景视觉和音频识别方法。该方法直接使用足球视频的原始视频帧和音频频谱图作为变压器模型的输入,可以捕获足球视频中动作在某一时刻的空间信息和不同动作之间的上下文时间信息。我们融合了从变压器模型输出的视频帧和音频频谱信息,以便更好地识别真实足球比赛中的场景。后期融合对视觉和音频估计结果进行加权平均,以获得足球场景的完整信息。我们在SoccerNet-V2数据集上评估了所提出的方法,并确认与最新和最先进的方法相比,我们的方法达到了最佳性能。
{"title":"Transformer Based Multimodal Scene Recognition in Soccer Videos","authors":"Yaozong Gan, Ren Togo, Takahiro Ogawa, M. Haseyama","doi":"10.1109/ICMEW56448.2022.9859304","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859304","url":null,"abstract":"This paper presents a transformer-based multimodal soccer scene recognition method for both visual and audio modalities. Our approach directly uses the original video frames and audio spectrogram from the soccer video as the input of the transformer model, which can capture the spatial information of the action at a moment and the contextual temporal information between different actions in the soccer videos. We fuse both video frames and audio spectrogram information output from the transformer model in order to better identify scenes that occur in real soccer matches. The late fusion performs a weighted average of visual and audio estimation results to obtain complete information of a soccer scene. We evaluate the proposed method on SoccerNet-V2 dataset and confirm that our method achieves the best performance compared with the recent and state-of-the-art methods.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124999507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAL-360IQA: A Saliency Weighted Patch-Based CNN Model for 360-Degree Images Quality Assessment SAL-360IQA:一种360度图像质量评估的基于显著性加权patch的CNN模型
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859468
Abderrezzaq Sendjasni, M. Larabi
Since the introduction of 360-degree images, a significant number of deep learning based image quality assessment (IQA) models have been introduced. Most of them are based on multichannel architectures where several convolutional neural networks (CNNs) are used together. Despite the competitive results, these models come with a higher cost in terms of complexity. To significantly reduce the complexity and ease the training of the CNN model, this paper proposes a patch-based scheme dedicated to 360-degree IQA. Our framework is developed including patches selection and extraction based on latitude to account for the importance of the equatorial region, data normalization, CNN-based architecture and a weighted average pooling of predicted local qualities. We evaluate the proposed model on two widely used databases and show the superiority to state-of-the-art models, even multichannel ones. Furthermore, the cross-database assessment revealed the good generalization ability, demonstrating the robustness of the proposed model.
自从引入360度图像以来,已经引入了大量基于深度学习的图像质量评估(IQA)模型。其中大多数是基于多个卷积神经网络(cnn)一起使用的多通道架构。尽管结果具有竞争力,但这些模型在复杂性方面的成本更高。为了显著降低CNN模型的复杂度和简化训练,本文提出了一种基于patch的360度IQA方案。我们开发的框架包括基于纬度的斑块选择和提取,以考虑赤道地区的重要性,数据归一化,基于cnn的架构和预测局部质量的加权平均池化。我们在两个广泛使用的数据库上对所提出的模型进行了评估,并显示了比最先进的模型,甚至是多通道模型的优越性。此外,跨数据库评估显示了良好的泛化能力,证明了该模型的鲁棒性。
{"title":"SAL-360IQA: A Saliency Weighted Patch-Based CNN Model for 360-Degree Images Quality Assessment","authors":"Abderrezzaq Sendjasni, M. Larabi","doi":"10.1109/ICMEW56448.2022.9859468","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859468","url":null,"abstract":"Since the introduction of 360-degree images, a significant number of deep learning based image quality assessment (IQA) models have been introduced. Most of them are based on multichannel architectures where several convolutional neural networks (CNNs) are used together. Despite the competitive results, these models come with a higher cost in terms of complexity. To significantly reduce the complexity and ease the training of the CNN model, this paper proposes a patch-based scheme dedicated to 360-degree IQA. Our framework is developed including patches selection and extraction based on latitude to account for the importance of the equatorial region, data normalization, CNN-based architecture and a weighted average pooling of predicted local qualities. We evaluate the proposed model on two widely used databases and show the superiority to state-of-the-art models, even multichannel ones. Furthermore, the cross-database assessment revealed the good generalization ability, demonstrating the robustness of the proposed model.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116028312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
3DSTNet: Neural 3D Shape Style Transfer 3DSTNet:神经三维形状风格转移
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859470
Abhinav Upadhyay, Alpana Dubey, Suma Mani Kuriakose, Devasish Mahato
In this work, we propose a 3D style transfer framework, 3DSTNet, to transfer shape or geometric properties from style to content 3D objects. We analyze the effects of multiple model hyperparameters on 3D style transfer. To evaluate the proposed 3D style transfer framework, we conduct a user study with 3D designers. Our evaluation results demonstrate that our approach effectively generates new designs and the generated designs aid in designers’ creativity.
在这项工作中,我们提出了一个3D风格转移框架3DSTNet,用于将形状或几何属性从样式转移到内容3D对象。我们分析了多个模型超参数对三维风格迁移的影响。为了评估提出的3D风格迁移框架,我们与3D设计师进行了用户研究。我们的评估结果表明,我们的方法有效地产生了新的设计,产生的设计有助于设计师的创造力。
{"title":"3DSTNet: Neural 3D Shape Style Transfer","authors":"Abhinav Upadhyay, Alpana Dubey, Suma Mani Kuriakose, Devasish Mahato","doi":"10.1109/ICMEW56448.2022.9859470","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859470","url":null,"abstract":"In this work, we propose a 3D style transfer framework, 3DSTNet, to transfer shape or geometric properties from style to content 3D objects. We analyze the effects of multiple model hyperparameters on 3D style transfer. To evaluate the proposed 3D style transfer framework, we conduct a user study with 3D designers. Our evaluation results demonstrate that our approach effectively generates new designs and the generated designs aid in designers’ creativity.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124549340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Smileverse : A VR Experience Smileverse: VR体验
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859417
Yi-Ping Hung, Jerry Chin-Han Goh, Yuan-An Chan, Hsiao-Yuan Chin, You-Shin Tsai, Chien-Hsin Ju
SmileVerse is a continuation of our previous work, Smiling Buddha [1], which aims to achieve emotional contagion. In the interactive installation of Smiling Buddha, we have designed a natural interactive process to let the smile that can be contagious. Based on the innovative VR/AI technologies, we upgraded our previous artwork to build a virtual universe and use facial trackers to detect the user’s expressions and let the virtual characters respond interactively to achieve smiles that can be contagious.
SmileVerse是我们之前的工作smile Buddha[1]的延续,旨在实现情绪传染。在《微笑佛》的互动装置中,我们设计了一个自然的互动过程,让微笑可以传染。基于创新的VR/AI技术,我们升级了之前的作品,构建了一个虚拟世界,并使用面部追踪器来检测用户的表情,让虚拟人物互动回应,从而实现具有传染性的微笑。
{"title":"Smileverse : A VR Experience","authors":"Yi-Ping Hung, Jerry Chin-Han Goh, Yuan-An Chan, Hsiao-Yuan Chin, You-Shin Tsai, Chien-Hsin Ju","doi":"10.1109/ICMEW56448.2022.9859417","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859417","url":null,"abstract":"SmileVerse is a continuation of our previous work, Smiling Buddha [1], which aims to achieve emotional contagion. In the interactive installation of Smiling Buddha, we have designed a natural interactive process to let the smile that can be contagious. Based on the innovative VR/AI technologies, we upgraded our previous artwork to build a virtual universe and use facial trackers to detect the user’s expressions and let the virtual characters respond interactively to achieve smiles that can be contagious.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126899395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tachiegan: Generative Adversarial Networks for Tachie Style Transfer Tachiegan: Tachie风格迁移的生成对抗网络
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859510
Zihan Chen, X. Chen
Tachie painting is an emerging digital portrait art form that shows a character in a standing pose. Automatic generation of a Tachie picture from a real photo would facilitate many creation tasks. However, it is non-trivial to represent Tachie’s artistic styles and establish a delicate mapping from the real-world image domain to the Tachie domain. Existing approaches generally suffer from inaccurate style transformation and severe structure distortion when applied to Tachie style transfer. In this paper, we propose the first approach for Tachie stylization of portrait photographs. Based on the unsupervised CycleGAN framework, we design two novel loss functions to emphasize lines and tones in the Tachie style. Furthermore, we design a character-enhanced stylization framework by introducing an auxiliary body mask to better preserve the global body structure. Experiment results demonstrate the robustness and better generation capability of our method in Tachie stylization from photos in a wide range of poses, even trained on a small dataset.
立画是一种新兴的数字肖像艺术形式,以站立的姿势展示人物。从真实照片自动生成Tachie图片将简化许多创建任务。然而,如何表现出太子的艺术风格,并建立起从真实世界的图像域到太子域的微妙映射,并非易事。现有方法在应用于塔式风格转换时,普遍存在风格转换不准确、结构扭曲严重的问题。在本文中,我们提出了第一种方法的塔希风格化的人像照片。基于无监督CycleGAN框架,我们设计了两个新的损失函数来强调线条和色调的Tachie风格。此外,我们设计了一个字符增强的风格化框架,通过引入辅助体掩模来更好地保留全局体结构。实验结果表明,即使在小数据集上训练,我们的方法在大范围姿势照片的Tachie风格化中也具有鲁棒性和更好的生成能力。
{"title":"Tachiegan: Generative Adversarial Networks for Tachie Style Transfer","authors":"Zihan Chen, X. Chen","doi":"10.1109/ICMEW56448.2022.9859510","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859510","url":null,"abstract":"Tachie painting is an emerging digital portrait art form that shows a character in a standing pose. Automatic generation of a Tachie picture from a real photo would facilitate many creation tasks. However, it is non-trivial to represent Tachie’s artistic styles and establish a delicate mapping from the real-world image domain to the Tachie domain. Existing approaches generally suffer from inaccurate style transformation and severe structure distortion when applied to Tachie style transfer. In this paper, we propose the first approach for Tachie stylization of portrait photographs. Based on the unsupervised CycleGAN framework, we design two novel loss functions to emphasize lines and tones in the Tachie style. Furthermore, we design a character-enhanced stylization framework by introducing an auxiliary body mask to better preserve the global body structure. Experiment results demonstrate the robustness and better generation capability of our method in Tachie stylization from photos in a wide range of poses, even trained on a small dataset.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120897453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICMEW 2022 Cover Page ICMEW 2022封面
Pub Date : 2022-07-18 DOI: 10.1109/icmew56448.2022.9859515
{"title":"ICMEW 2022 Cover Page","authors":"","doi":"10.1109/icmew56448.2022.9859515","DOIUrl":"https://doi.org/10.1109/icmew56448.2022.9859515","url":null,"abstract":"","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133543674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Topology Coding and Payload Partitioning Techniques for Neural Network Compression (NNC) Standard 神经网络压缩(NNC)标准的高效拓扑编码和有效负载划分技术
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859467
Jaakko Laitinen, Alexandre Mercat, Jarno Vanne, H. R. Tavakoli, Francesco Cricri, Emre B. Aksu, M. Hannuksela
A Neural Network Compression (NNC) standard aims to define a set of coding tools for efficient compression and transmission of neural networks. This paper addresses the high-level syntax (HLS) of NNC and proposes three HLS techniques for network topology coding and payload partitioning. Our first technique provides an efficient way to code prune topology information. It removes redundancy in the bitmask and thereby improves coding efficiency by 4–99% over existing approaches. The second technique processes bitmasks in larger chunks instead of one bit at a time. It is shown to reduce computational complexity of NNC encoding by 63% and NNC decoding by 82%. Our third technique makes use of partial data counters to partition an NNC bitstream into uniformly sized units for more efficient data transmission. Even though the smaller partition sizes introduce some overhead, our network simulations show better throughput due to lower packet retransmission rates. To our knowledge, this the first work to address the practical implementation aspects of HLS. The proposed techniques can be seen as key enabling factors for efficient adaptation and economical deployment of the NNC standard in a plurality of next-generation industrial and academic applications.
神经网络压缩(NNC)标准旨在定义一套有效压缩和传输神经网络的编码工具。本文研究了NNC的高级语法(HLS),并提出了三种用于网络拓扑编码和有效负载划分的高级语法技术。我们的第一种技术提供了一种有效的方式来编码修剪拓扑信息。它消除了位掩码中的冗余,从而比现有方法提高了4-99%的编码效率。第二种技术是以更大的块来处理位掩码,而不是一次处理一个位。结果表明,该方法可将NNC编码的计算复杂度降低63%,将NNC解码的计算复杂度降低82%。我们的第三种技术利用部分数据计数器将NNC比特流划分为统一大小的单元,以提高数据传输效率。尽管较小的分区大小会带来一些开销,但我们的网络模拟显示,由于数据包重传率较低,吞吐量更好。据我们所知,这是解决HLS实际实施方面的第一个工作。所提出的技术可以被视为在多个下一代工业和学术应用中有效适应和经济部署NNC标准的关键使能因素。
{"title":"Efficient Topology Coding and Payload Partitioning Techniques for Neural Network Compression (NNC) Standard","authors":"Jaakko Laitinen, Alexandre Mercat, Jarno Vanne, H. R. Tavakoli, Francesco Cricri, Emre B. Aksu, M. Hannuksela","doi":"10.1109/ICMEW56448.2022.9859467","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859467","url":null,"abstract":"A Neural Network Compression (NNC) standard aims to define a set of coding tools for efficient compression and transmission of neural networks. This paper addresses the high-level syntax (HLS) of NNC and proposes three HLS techniques for network topology coding and payload partitioning. Our first technique provides an efficient way to code prune topology information. It removes redundancy in the bitmask and thereby improves coding efficiency by 4–99% over existing approaches. The second technique processes bitmasks in larger chunks instead of one bit at a time. It is shown to reduce computational complexity of NNC encoding by 63% and NNC decoding by 82%. Our third technique makes use of partial data counters to partition an NNC bitstream into uniformly sized units for more efficient data transmission. Even though the smaller partition sizes introduce some overhead, our network simulations show better throughput due to lower packet retransmission rates. To our knowledge, this the first work to address the practical implementation aspects of HLS. The proposed techniques can be seen as key enabling factors for efficient adaptation and economical deployment of the NNC standard in a plurality of next-generation industrial and academic applications.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115035964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OPSE: Online Per-Scene Encoding for Adaptive Http Live Streaming OPSE:自适应Http直播的在线逐场景编码
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859502
V. V. Menon, Hadi Amirpour, Christian Feldmann, M. Ghanbari, C. Timmerer
In live streaming applications, typically a fixed set of bitrateresolution pairs (known as a bitrate ladder) is used during the entire streaming session in order to avoid the additional latency to find scene transitions and optimized bitrateresolution pairs for every video content. However, an optimized bitrate ladder per scene may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces an Online Per-Scene Encoding (OPSE) scheme for adaptive HTTP live streaming applications. In this scheme, scene transitions and optimized bitrate-resolution pairs for every scene are predicted using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features. Experimental results show that, on average, OPSEyields bitrate savings of up to 48.88% in certain scenes to maintain the same VMAF, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming.
在实时流媒体应用中,通常在整个流媒体会话期间使用一组固定的比特率分辨率对(称为比特率阶梯),以避免为每个视频内容寻找场景转换和优化比特率分辨率对的额外延迟。然而,优化每个场景的比特率阶梯可能会导致(i)降低存储或传输成本或/和(ii)提高体验质量(QoE)。本文介绍了一种用于自适应HTTP直播应用的在线逐场景编码(OPSE)方案。在该方案中,使用基于能量的低复杂度空间和时间特征的离散余弦变换(DCT)预测场景转换和优化的比特率分辨率对。实验结果表明,与参考HTTP Live Streaming (HLS)比特率阶梯相比,平均而言,opse在某些场景中可以节省高达48.88%的比特率,以保持相同的VMAF,而在流媒体中没有任何明显的额外延迟。
{"title":"OPSE: Online Per-Scene Encoding for Adaptive Http Live Streaming","authors":"V. V. Menon, Hadi Amirpour, Christian Feldmann, M. Ghanbari, C. Timmerer","doi":"10.1109/ICMEW56448.2022.9859502","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859502","url":null,"abstract":"In live streaming applications, typically a fixed set of bitrateresolution pairs (known as a bitrate ladder) is used during the entire streaming session in order to avoid the additional latency to find scene transitions and optimized bitrateresolution pairs for every video content. However, an optimized bitrate ladder per scene may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces an Online Per-Scene Encoding (OPSE) scheme for adaptive HTTP live streaming applications. In this scheme, scene transitions and optimized bitrate-resolution pairs for every scene are predicted using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features. Experimental results show that, on average, OPSEyields bitrate savings of up to 48.88% in certain scenes to maintain the same VMAF, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"27 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120854096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
No-Reference Light Field Image Quality Assessment Method Based on a Long-Short Term Memory Neural Network 基于长短期记忆神经网络的无参考光场图像质量评价方法
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859419
Sana Alamgeer, Mylène C. Q. Farias
Light Field (LF) cameras capture angular and spatial information and, consequently, require a large amount of resources in memory and bandwidth. To reduce these requirements, LF contents generally need to undergo compression and transmission protocols. Since these techniques may introduce distortions, the design of Light-Field Image Quality Assessment (LFI-IQA) methods are important to monitor the quality of the LFI content at the user side. In this work, we present a No-Reference (NR) LFIIQA method that is based on a Long Short-Term Memory based Deep Neural Network (LSTM-DNN). The method is composed of two streams. The first stream extracts long-term dependent distortion related features from horizontal epipolar plane images, while the second stream processes bottleneck features of micro-lens images. The outputs of both streams are fused, and supplied to a regression operation that generates a scalar value as a predicted quality score. Results show that the proposed method is robust and accurate, outperforming several state-of-the-art LF-IQA methods.
光场(LF)相机捕获角度和空间信息,因此需要大量的内存和带宽资源。为了减少这些需求,LF内容通常需要经过压缩和传输协议。由于这些技术可能会引入失真,光场图像质量评估(LFI- iqa)方法的设计对于监测用户侧LFI内容的质量非常重要。在这项工作中,我们提出了一种基于长短期记忆的深度神经网络(LSTM-DNN)的无参考(NR) LFIIQA方法。该方法由两个流组成。第一流提取水平极平面图像的长期依赖畸变相关特征,第二流处理微透镜图像的瓶颈特征。两个流的输出被融合,并提供给一个回归操作,该操作生成一个标量值作为预测的质量分数。结果表明,该方法具有鲁棒性和准确性,优于几种最先进的LF-IQA方法。
{"title":"No-Reference Light Field Image Quality Assessment Method Based on a Long-Short Term Memory Neural Network","authors":"Sana Alamgeer, Mylène C. Q. Farias","doi":"10.1109/ICMEW56448.2022.9859419","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859419","url":null,"abstract":"Light Field (LF) cameras capture angular and spatial information and, consequently, require a large amount of resources in memory and bandwidth. To reduce these requirements, LF contents generally need to undergo compression and transmission protocols. Since these techniques may introduce distortions, the design of Light-Field Image Quality Assessment (LFI-IQA) methods are important to monitor the quality of the LFI content at the user side. In this work, we present a No-Reference (NR) LFIIQA method that is based on a Long Short-Term Memory based Deep Neural Network (LSTM-DNN). The method is composed of two streams. The first stream extracts long-term dependent distortion related features from horizontal epipolar plane images, while the second stream processes bottleneck features of micro-lens images. The outputs of both streams are fused, and supplied to a regression operation that generates a scalar value as a predicted quality score. Results show that the proposed method is robust and accurate, outperforming several state-of-the-art LF-IQA methods.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128494271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Persong: Multi-Modality Driven Music Recommendation System 个人:多模态驱动的音乐推荐系统
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859488
Haonan Cheng, Xiaoying Huang, Ruyu Zhang, Long Ye
In this work, we develop PerSong, a music recommendation system that can recommend personalised songs based on the user’s current status. First, multi-modal physiological signals, namely visual and heart rate, are collected and combined to construct multi-level temporal sequences. Then, we propose a Global-Local Similarity Function (GLSF) based music recommendation algorithm to establish a mapping between the user’s current state and the music. Our demonstrations have attended a quite number of exhibitions and shown remarkable performance under diverse circumstances. We have made the core of our work publicly available: https://github.com/yrz7991/GLSF/tree/master.
在这项工作中,我们开发了一个音乐推荐系统PerSong,它可以根据用户的当前状态推荐个性化的歌曲。首先,采集多模态生理信号,即视觉信号和心率信号,并进行组合,构建多层次时间序列;然后,我们提出了一种基于全局-局部相似函数(Global-Local Similarity Function, GLSF)的音乐推荐算法来建立用户当前状态与音乐之间的映射关系。我们的示范产品参加了不少展览会,在各种环境下都表现出色。我们已经公开了我们工作的核心内容:https://github.com/yrz7991/GLSF/tree/master。
{"title":"Persong: Multi-Modality Driven Music Recommendation System","authors":"Haonan Cheng, Xiaoying Huang, Ruyu Zhang, Long Ye","doi":"10.1109/ICMEW56448.2022.9859488","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859488","url":null,"abstract":"In this work, we develop PerSong, a music recommendation system that can recommend personalised songs based on the user’s current status. First, multi-modal physiological signals, namely visual and heart rate, are collected and combined to construct multi-level temporal sequences. Then, we propose a Global-Local Similarity Function (GLSF) based music recommendation algorithm to establish a mapping between the user’s current state and the music. Our demonstrations have attended a quite number of exhibitions and shown remarkable performance under diverse circumstances. We have made the core of our work publicly available: https://github.com/yrz7991/GLSF/tree/master.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127159979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1