首页 > 最新文献

2020 IEEE International Symposium on Multimedia (ISM)最新文献

英文 中文
Towards Scalable Retrieval of Human Motion Episodes 面向可扩展的人体运动片段检索
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00015
Petra Budíková, J. Sedmidubský, J. Horvath, P. Zezula
With the increasing availability of human motion data captured in the form of 2D/3D skeleton sequences, more complex motion recordings need to be processed. In this paper, we focus on the similarity-based retrieval of motion episodes - medium-sized skeleton sequences that consist of multiple semantic actions and correspond to some logical motion unit (e.g., a figure skating performance). We examine two orthogonal approaches to the episode-matching task: (1) the deep learning approach that is traditionally used for processing short motion actions, and (2) the motion-word technique that transforms skeleton sequences into a text-like representation. Since the second approach is more promising, we propose a two-phase retrieval scheme that combines mature text-processing techniques with application-specific refinement methods. We demonstrate that this solution achieves promising results in both effectiveness and efficiency, and can be further indexed to implement scalable episode retrieval.
随着以2D/3D骨骼序列形式捕获的人体运动数据的增加,需要处理更复杂的运动记录。在本文中,我们专注于基于相似度的运动片段检索-由多个语义动作组成的中等骨架序列,并对应于一些逻辑运动单元(例如,花样滑冰表演)。我们研究了情节匹配任务的两种正交方法:(1)传统上用于处理短动作的深度学习方法,以及(2)将骨架序列转换为类似文本表示的动作词技术。由于第二种方法更有前途,我们提出了一种两阶段检索方案,该方案将成熟的文本处理技术与特定于应用程序的细化方法相结合。我们证明了该解决方案在有效性和效率方面都取得了令人鼓舞的结果,并且可以进一步索引以实现可扩展的集检索。
{"title":"Towards Scalable Retrieval of Human Motion Episodes","authors":"Petra Budíková, J. Sedmidubský, J. Horvath, P. Zezula","doi":"10.1109/ISM.2020.00015","DOIUrl":"https://doi.org/10.1109/ISM.2020.00015","url":null,"abstract":"With the increasing availability of human motion data captured in the form of 2D/3D skeleton sequences, more complex motion recordings need to be processed. In this paper, we focus on the similarity-based retrieval of motion episodes - medium-sized skeleton sequences that consist of multiple semantic actions and correspond to some logical motion unit (e.g., a figure skating performance). We examine two orthogonal approaches to the episode-matching task: (1) the deep learning approach that is traditionally used for processing short motion actions, and (2) the motion-word technique that transforms skeleton sequences into a text-like representation. Since the second approach is more promising, we propose a two-phase retrieval scheme that combines mature text-processing techniques with application-specific refinement methods. We demonstrate that this solution achieves promising results in both effectiveness and efficiency, and can be further indexed to implement scalable episode retrieval.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121850938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MPEG-DASH users quality of experience enhancement for MOOC videos MPEG-DASH用户对MOOC视频体验质量的提升
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00036
D. Sebai, Emna Mani
The Dynamic Adaptive Streaming over HTTP (MPEG-DASH) ensures online videos display of good quality and without interruption. It provides an adequate streaming for each display device and network transmission. This can be very useful for the specific field of Massive Open Online Courses (MOOCs) where learners profit from an exceptional visual experience that improves their commitment level and eases the course assimilation. These MPEG-DASH assets can become more and more advantageous if a good choice of its parameters is made. Being a recent branch, the MPEG-DASH adaptive diffusion presents a research field where the efforts are still limited, even more for MOOC videos. Most of the work published in this sense focus on the Quality of Service (QoS) and the technical specifications of the network transmission. In this paper, we aim to consider the quality of the streamed content that directly impacts the learners quality of Experience (QoE). For this, we develop a content-aware dataset that includes several dashified MOOC videos. These latter are then exploited to study the most appropriate bitrates and segment durations for each type of MOOC videos.
基于HTTP的动态自适应流(MPEG-DASH)保证了在线视频的高质量和不间断显示。它为每个显示设备和网络传输提供了足够的流。这对于大规模在线开放课程(MOOCs)的特定领域非常有用,学习者可以从一种特殊的视觉体验中获益,这种体验可以提高他们的投入程度,并简化课程同化。如果选择合适的参数,这些MPEG-DASH资产将变得越来越有优势。作为最近的一个分支,MPEG-DASH自适应扩散提出了一个研究领域的努力仍然有限,甚至更多的MOOC视频。在这个意义上发表的大多数工作都集中在服务质量(QoS)和网络传输的技术规范上。在本文中,我们的目标是考虑直接影响学习者体验质量(QoE)的流媒体内容的质量。为此,我们开发了一个内容感知数据集,其中包括几个花哨的MOOC视频。然后利用这些后者来研究每种MOOC视频的最合适的比特率和分段持续时间。
{"title":"MPEG-DASH users quality of experience enhancement for MOOC videos","authors":"D. Sebai, Emna Mani","doi":"10.1109/ISM.2020.00036","DOIUrl":"https://doi.org/10.1109/ISM.2020.00036","url":null,"abstract":"The Dynamic Adaptive Streaming over HTTP (MPEG-DASH) ensures online videos display of good quality and without interruption. It provides an adequate streaming for each display device and network transmission. This can be very useful for the specific field of Massive Open Online Courses (MOOCs) where learners profit from an exceptional visual experience that improves their commitment level and eases the course assimilation. These MPEG-DASH assets can become more and more advantageous if a good choice of its parameters is made. Being a recent branch, the MPEG-DASH adaptive diffusion presents a research field where the efforts are still limited, even more for MOOC videos. Most of the work published in this sense focus on the Quality of Service (QoS) and the technical specifications of the network transmission. In this paper, we aim to consider the quality of the streamed content that directly impacts the learners quality of Experience (QoE). For this, we develop a content-aware dataset that includes several dashified MOOC videos. These latter are then exploited to study the most appropriate bitrates and segment durations for each type of MOOC videos.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117288225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FID: Frame Interpolation and DCT-based Video Compression FID:帧插值和基于dct的视频压缩
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00045
Yeganeh Jalalpour, Li-Yun Wang, W. Feng, Feng Liu
In this paper, we present a hybrid video compression technique that combines the advantages of residual coding techniques found in traditional DCT-based video compression and learning-based video frame interpolation to reduce the amount of residual data that needs to be compressed. Learning-based frame interpolation techniques use machine learning algorithms to predict frames but have difficulty with uncovered areas and non-linear motion. This approach uses DCT-based residual coding only on areas that are difficult for video interpolation and provides tunable compression for such areas through an adaptive selection of data to be encoded. Experimental data for both PSNR and the newer video multi-method assessment fusion (VMAF) metrics are provided. Our results show that we can reduce the amount of data required to represent a video stream compared with traditional video coding while outperforming video frame interpolation techniques in quality.
在本文中,我们提出了一种混合视频压缩技术,该技术结合了传统基于dct的视频压缩和基于学习的视频帧插值中残差编码技术的优点,以减少需要压缩的残差数据量。基于学习的帧插值技术使用机器学习算法来预测帧,但在未覆盖区域和非线性运动方面存在困难。该方法仅对难以进行视频插值的区域使用基于dct的残差编码,并通过自适应选择要编码的数据为这些区域提供可调压缩。给出了PSNR和较新的视频多方法评估融合(VMAF)指标的实验数据。我们的结果表明,与传统的视频编码相比,我们可以减少表示视频流所需的数据量,同时在质量上优于视频帧插值技术。
{"title":"FID: Frame Interpolation and DCT-based Video Compression","authors":"Yeganeh Jalalpour, Li-Yun Wang, W. Feng, Feng Liu","doi":"10.1109/ISM.2020.00045","DOIUrl":"https://doi.org/10.1109/ISM.2020.00045","url":null,"abstract":"In this paper, we present a hybrid video compression technique that combines the advantages of residual coding techniques found in traditional DCT-based video compression and learning-based video frame interpolation to reduce the amount of residual data that needs to be compressed. Learning-based frame interpolation techniques use machine learning algorithms to predict frames but have difficulty with uncovered areas and non-linear motion. This approach uses DCT-based residual coding only on areas that are difficult for video interpolation and provides tunable compression for such areas through an adaptive selection of data to be encoded. Experimental data for both PSNR and the newer video multi-method assessment fusion (VMAF) metrics are provided. Our results show that we can reduce the amount of data required to represent a video stream compared with traditional video coding while outperforming video frame interpolation techniques in quality.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116757427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Audio Captioning Based on Combined Audio and Semantic Embeddings 基于组合音频和语义嵌入的音频字幕
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00014
Aysegül Özkaya Eren, M. Sert
Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. Most existing approaches use the encoder-decoder model without using semantic information. In this study, we propose a bi-directional Gated Recurrent Unit (BiGRU) model based on encoder-decoder architecture using audio and semantic embed-dings. To obtain semantic embeddings, we extract subject-verb embeddings using the subjects and verbs from the audio captions. We use a Multilayer Perceptron classifier to predict subject-verb embeddings of test audio clips for the testing stage. Within the aim of extracting audio features, in addition to log Mel energies, we use a pretrained audio neural network (PANN) as a feature extractor which is used for the first time in the audio captioning task to explore the usability of audio embeddings in the audio captioning task. We combine audio embeddings and semantic embeddings to feed the BiGRU-based encoder-decoder model. Following this, we evaluate our model on two audio captioning datasets: Clotho and AudioCaps. Experimental results show that the proposed BiGRU-based deep model significantly outperforms the state of the art results across different evaluation metrics and inclusion of semantic information enhance the captioning performance.
音频字幕是最近提出的一项任务,用于自动生成给定音频片段的文本描述。大多数现有的方法使用编码器-解码器模型而不使用语义信息。在这项研究中,我们提出了一个双向门控循环单元(BiGRU)模型,该模型基于音频和语义嵌入的编码器-解码器架构。为了获得语义嵌入,我们使用音频字幕中的主语和动词提取主谓嵌入。在测试阶段,我们使用多层感知器分类器来预测测试音频片段的主谓嵌入。在提取音频特征的目的中,除了对数Mel能量外,我们还使用了预训练的音频神经网络(PANN)作为特征提取器,该方法首次用于音频字幕任务,以探索音频嵌入在音频字幕任务中的可用性。我们结合音频嵌入和语义嵌入来提供基于bigru的编码器-解码器模型。接下来,我们在两个音频字幕数据集上评估我们的模型:Clotho和AudioCaps。实验结果表明,所提出的基于bigru的深度模型在不同评价指标上的表现明显优于目前的结果,并且语义信息的包含增强了字幕性能。
{"title":"Audio Captioning Based on Combined Audio and Semantic Embeddings","authors":"Aysegül Özkaya Eren, M. Sert","doi":"10.1109/ISM.2020.00014","DOIUrl":"https://doi.org/10.1109/ISM.2020.00014","url":null,"abstract":"Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. Most existing approaches use the encoder-decoder model without using semantic information. In this study, we propose a bi-directional Gated Recurrent Unit (BiGRU) model based on encoder-decoder architecture using audio and semantic embed-dings. To obtain semantic embeddings, we extract subject-verb embeddings using the subjects and verbs from the audio captions. We use a Multilayer Perceptron classifier to predict subject-verb embeddings of test audio clips for the testing stage. Within the aim of extracting audio features, in addition to log Mel energies, we use a pretrained audio neural network (PANN) as a feature extractor which is used for the first time in the audio captioning task to explore the usability of audio embeddings in the audio captioning task. We combine audio embeddings and semantic embeddings to feed the BiGRU-based encoder-decoder model. Following this, we evaluate our model on two audio captioning datasets: Clotho and AudioCaps. Experimental results show that the proposed BiGRU-based deep model significantly outperforms the state of the art results across different evaluation metrics and inclusion of semantic information enhance the captioning performance.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115231046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Multi-view Neural Networks for Raw Audio-based Music Emotion Recognition 基于原始音频的音乐情感识别的多视图神经网络
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00037
Na He, Sam Ferguson
In Music Emotion Recognition (MER) research, most existing research uses human engineered audio features as learning model inputs, which require domain knowledge and much effort for feature extraction. We propose a novel end-to-end deep learning approach to address music emotion recognition as a regression problem, using the raw audio signal as input. We adopt multi-view convolutional neural networks as feature extractors to learn feature representations automatically. Then the extracted feature vectors are merged and fed into two layers of Bidirectional Long Short-Term Memory to capture temporal context sufficiently. In this way, our model is capable of recognizing dynamic music emotion without requiring too much workload on domain knowledge learning and audio feature processing. Combined with data augmentation strategies, the experimental results show that our model outperforms the state-of-the-art baseline with a significant margin in terms of R2 score (approximately 16%) on the Emotion in Music Database.
在音乐情感识别(MER)的研究中,现有的研究大多使用人类设计的音频特征作为学习模型输入,这需要领域知识和大量的工作来提取特征。我们提出了一种新颖的端到端深度学习方法来解决音乐情感识别作为一个回归问题,使用原始音频信号作为输入。我们采用多视图卷积神经网络作为特征提取器,自动学习特征表示。然后将提取的特征向量合并到两层双向长短期记忆中,以充分捕获时间上下文。这样,我们的模型能够在不需要太多的领域知识学习和音频特征处理的情况下识别动态音乐情感。结合数据增强策略,实验结果表明,我们的模型在音乐情感数据库上的R2分数(约16%)方面优于最先进的基线。
{"title":"Multi-view Neural Networks for Raw Audio-based Music Emotion Recognition","authors":"Na He, Sam Ferguson","doi":"10.1109/ISM.2020.00037","DOIUrl":"https://doi.org/10.1109/ISM.2020.00037","url":null,"abstract":"In Music Emotion Recognition (MER) research, most existing research uses human engineered audio features as learning model inputs, which require domain knowledge and much effort for feature extraction. We propose a novel end-to-end deep learning approach to address music emotion recognition as a regression problem, using the raw audio signal as input. We adopt multi-view convolutional neural networks as feature extractors to learn feature representations automatically. Then the extracted feature vectors are merged and fed into two layers of Bidirectional Long Short-Term Memory to capture temporal context sufficiently. In this way, our model is capable of recognizing dynamic music emotion without requiring too much workload on domain knowledge learning and audio feature processing. Combined with data augmentation strategies, the experimental results show that our model outperforms the state-of-the-art baseline with a significant margin in terms of R2 score (approximately 16%) on the Emotion in Music Database.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130236258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Vid2Pix - A Framework for Generating High-Quality Synthetic Videos Vid2Pix -一个生成高质量合成视频的框架
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00010
O. O. Nedrejord, Vajira Lasantha Thambawita, S. Hicks, P. Halvorsen, M. Riegler
Data is arguably the most important resource today as it fuels the algorithms powering services we use every day. However, in fields like medicine, publicly available datasets are few, and labeling medical datasets require tedious efforts from trained specialists. Generated synthetic data can be to future successful healthcare clinical intelligence. Here, we present a GAN-based video generator demonstrating promising results.
数据可以说是当今最重要的资源,因为它为我们每天使用的服务提供算法支持。然而,在医学等领域,公开可用的数据集很少,并且标记医疗数据集需要训练有素的专家进行繁琐的工作。生成的合成数据可以为未来成功的医疗保健提供临床智能。在这里,我们提出了一个基于gan的视频生成器,展示了有希望的结果。
{"title":"Vid2Pix - A Framework for Generating High-Quality Synthetic Videos","authors":"O. O. Nedrejord, Vajira Lasantha Thambawita, S. Hicks, P. Halvorsen, M. Riegler","doi":"10.1109/ISM.2020.00010","DOIUrl":"https://doi.org/10.1109/ISM.2020.00010","url":null,"abstract":"Data is arguably the most important resource today as it fuels the algorithms powering services we use every day. However, in fields like medicine, publicly available datasets are few, and labeling medical datasets require tedious efforts from trained specialists. Generated synthetic data can be to future successful healthcare clinical intelligence. Here, we present a GAN-based video generator demonstrating promising results.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Segment Repackaging at the Edge for HTTP Adaptive Streaming HTTP自适应流的边缘动态段重新包装
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00009
Jesús Aguilar Armijo, Babak Taraghi, C. Timmerer, H. Hellwagner
Adaptive video streaming systems typically support different media delivery formats, e.g., MPEG-DASH and HLS, replicating the same content multiple times into the network. Such a diversified system results in inefficient use of storage, caching, and bandwidth resources. The Common Media Application Format (CMAF) emerges to simplify HTTP Adaptive Streaming (HAS), providing a single encoding and packaging format of segmented media content and offering the opportunities of bandwidth savings, more cache hits, and less storage needed. However, CMAF is not yet supported by most devices. To solve this issue, we present a solution where we maintain the main advantages of CMAF while supporting heterogeneous devices using different media delivery formats. For that purpose, we propose to dynamically convert the content from CMAF to the desired media delivery format at an edge node. We study the bandwidth savings with our proposed approach using an analytical model and simulation, resulting in bandwidth savings of up to 20% with different media delivery format distributions. We analyze the runtime impact of the required operations on the segmented content performed in two scenarios: (i) the classic one, with four different media delivery formats, and (ii) the proposed scenario, using CMAF-only delivery through the network. We compare both scenarios with different edge compute power assumptions. Finally, we perform experiments in a real video streaming testbed delivering MPEG-DASH using CMAF content to serve a DASH and an HLS client, performing the media conversion for the latter one.
自适应视频流系统通常支持不同的媒体传输格式,例如MPEG-DASH和HLS,将相同的内容多次复制到网络中。这种多样化的系统导致了存储、缓存和带宽资源的低效使用。通用媒体应用程序格式(CMAF)的出现是为了简化HTTP自适应流(HAS),它为分段媒体内容提供了一种单一的编码和打包格式,并提供了节省带宽、更多缓存命中和更少存储所需的机会。然而,大多数设备还不支持CMAF。为了解决这个问题,我们提出了一个解决方案,在保持CMAF的主要优势的同时,支持使用不同媒体传输格式的异构设备。为此,我们建议在边缘节点将内容从CMAF动态转换为所需的媒体交付格式。我们使用分析模型和模拟研究了我们提出的方法所节省的带宽,在不同的媒体传输格式分布下,带宽节省高达20%。我们分析了在两种场景中执行的所需操作对分段内容的运行时影响:(i)具有四种不同媒体交付格式的经典场景,以及(ii)仅通过网络使用cmaf交付的拟议场景。我们用不同的边缘计算能力假设来比较这两种场景。最后,我们在一个真实的视频流测试平台上进行了实验,使用CMAF内容提供MPEG-DASH服务于DASH和HLS客户端,并对后者进行了媒体转换。
{"title":"Dynamic Segment Repackaging at the Edge for HTTP Adaptive Streaming","authors":"Jesús Aguilar Armijo, Babak Taraghi, C. Timmerer, H. Hellwagner","doi":"10.1109/ISM.2020.00009","DOIUrl":"https://doi.org/10.1109/ISM.2020.00009","url":null,"abstract":"Adaptive video streaming systems typically support different media delivery formats, e.g., MPEG-DASH and HLS, replicating the same content multiple times into the network. Such a diversified system results in inefficient use of storage, caching, and bandwidth resources. The Common Media Application Format (CMAF) emerges to simplify HTTP Adaptive Streaming (HAS), providing a single encoding and packaging format of segmented media content and offering the opportunities of bandwidth savings, more cache hits, and less storage needed. However, CMAF is not yet supported by most devices. To solve this issue, we present a solution where we maintain the main advantages of CMAF while supporting heterogeneous devices using different media delivery formats. For that purpose, we propose to dynamically convert the content from CMAF to the desired media delivery format at an edge node. We study the bandwidth savings with our proposed approach using an analytical model and simulation, resulting in bandwidth savings of up to 20% with different media delivery format distributions. We analyze the runtime impact of the required operations on the segmented content performed in two scenarios: (i) the classic one, with four different media delivery formats, and (ii) the proposed scenario, using CMAF-only delivery through the network. We compare both scenarios with different edge compute power assumptions. Finally, we perform experiments in a real video streaming testbed delivering MPEG-DASH using CMAF content to serve a DASH and an HLS client, performing the media conversion for the latter one.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133578238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Extraction of Frame Sequences in the Manga Context 漫画语境中帧序列的提取
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00023
Christian Roggia, Fabio Persia
Manga are one of the most popular forms of comics consumed on a global level. Unfortunately, this kind of media was not designed for digital consumption, and consequently its format does not fit well into small areas, such as smartphone screens. In order to cope with this issue, in this paper we propose a novel approach to comics segmentation and sequencing by taking advantage of existing machine learning concepts which are used to generate an artificial intelligence (AI) capable of correctly detecting panels within an image. The regions proposed by the AI are then used to generate a grid that acts as anchor points for a mobile application guiding the reader during navigation and enabling full Manga responsiveness. The developed approach achieves overall better performances in terms of precision and recall, as well as higher fault tolerance than state-of-the-art approaches. The reliability of this method is also considered largely satisfactory for real-world scenarios, so that we are about to finalize an app implementing the method to be spread soon; additionally, future work will be devoted to generalize our approach to all the comics formats.
漫画是全球最受欢迎的漫画形式之一。不幸的是,这种媒体不是为数字消费而设计的,因此它的格式不适合小区域,比如智能手机屏幕。为了解决这个问题,在本文中,我们提出了一种新的漫画分割和排序方法,利用现有的机器学习概念,用于生成能够正确检测图像中的面板的人工智能(AI)。然后,人工智能提出的区域被用来生成一个网格,作为移动应用程序的锚点,在导航过程中指导读者,并实现完整的漫画响应。所开发的方法在精度和召回率方面实现了更好的总体性能,并且比最先进的方法具有更高的容错性。这种方法的可靠性也被认为在现实世界的场景中是令人满意的,所以我们即将完成一款应用程序,实现即将推广的方法;此外,未来的工作将致力于推广我们对所有漫画格式的方法。
{"title":"Extraction of Frame Sequences in the Manga Context","authors":"Christian Roggia, Fabio Persia","doi":"10.1109/ISM.2020.00023","DOIUrl":"https://doi.org/10.1109/ISM.2020.00023","url":null,"abstract":"Manga are one of the most popular forms of comics consumed on a global level. Unfortunately, this kind of media was not designed for digital consumption, and consequently its format does not fit well into small areas, such as smartphone screens. In order to cope with this issue, in this paper we propose a novel approach to comics segmentation and sequencing by taking advantage of existing machine learning concepts which are used to generate an artificial intelligence (AI) capable of correctly detecting panels within an image. The regions proposed by the AI are then used to generate a grid that acts as anchor points for a mobile application guiding the reader during navigation and enabling full Manga responsiveness. The developed approach achieves overall better performances in terms of precision and recall, as well as higher fault tolerance than state-of-the-art approaches. The reliability of this method is also considered largely satisfactory for real-world scenarios, so that we are about to finalize an app implementing the method to be spread soon; additionally, future work will be devoted to generalize our approach to all the comics formats.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128780370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Audio Steganography Algorithm Based on Genetic Algorithm for MDCT Coefficient Adjustment for AAC 基于遗传算法的AAC MDCT系数调整音频隐写算法
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00026
Chen Li, Xiaodong Zhang, Tao Luo, Lihua Tian
An AAC steganography algorithm based on genetic algorithm and MDCT coefficient adjustment is proposed. Our algorithm selects the small value region of MDCT coefficient as the embedding bit and the coefficients in codebook 1/2 are designed to change. In order to be against steganalysis better, genetic algorithm is used to optimize the change of the coefficient. The experiment results show that the algorithm has good embedding capacity, high steganography and good imperceptibility.
提出了一种基于遗传算法和MDCT系数调整的AAC隐写算法。我们的算法选择MDCT系数的小值区域作为嵌入位,并设计码本1/2中的系数变化。为了更好地抵抗隐写,采用遗传算法对系数的变化进行优化。实验结果表明,该算法具有良好的嵌入容量、较高的隐写性和良好的隐蔽性。
{"title":"Audio Steganography Algorithm Based on Genetic Algorithm for MDCT Coefficient Adjustment for AAC","authors":"Chen Li, Xiaodong Zhang, Tao Luo, Lihua Tian","doi":"10.1109/ISM.2020.00026","DOIUrl":"https://doi.org/10.1109/ISM.2020.00026","url":null,"abstract":"An AAC steganography algorithm based on genetic algorithm and MDCT coefficient adjustment is proposed. Our algorithm selects the small value region of MDCT coefficient as the embedding bit and the coefficients in codebook 1/2 are designed to change. In order to be against steganalysis better, genetic algorithm is used to optimize the change of the coefficient. The experiment results show that the algorithm has good embedding capacity, high steganography and good imperceptibility.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121108791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Melody-Conditioned Lyrics Generation with SeqGANs 旋律条件的歌词生成与SeqGANs
Pub Date : 2020-10-28 DOI: 10.1109/ISM.2020.00040
Yihao Chen, Alexander Lerch
Automatic lyrics generation has received attention from both music and AI communities for years. Early rule-based approaches have -due to increases in computational power and evolution in data-driven modelsmostly been replaced with deep-learning-based systems. Many existing approaches, however, either rely heavily on prior knowledge in music and lyrics writing or oversimplify the task by largely discarding melodic information and its relationship with the text. We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN), which generates a line of lyrics given the corresponding melody as the input. Furthermore, we investigate the performance of the generator with an additional input condition: the theme or overarching topic of the lyrics to be generated. We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
歌词自动生成多年来一直受到音乐界和人工智能界的关注。由于计算能力的提高和数据驱动模型的发展,早期基于规则的方法已经大部分被基于深度学习的系统所取代。然而,许多现有的方法要么严重依赖于音乐和歌词写作的先验知识,要么通过大量抛弃旋律信息及其与文本的关系来过度简化任务。我们提出了一个基于序列生成对抗网络(Sequence Generative Adversarial Networks, SeqGAN)的端到端旋律条件歌词生成系统,该系统在给定相应旋律作为输入的情况下生成一行歌词。此外,我们用一个额外的输入条件来研究生成器的性能:要生成的歌词的主题或总体主题。我们表明,输入条件对评估指标没有负面影响,同时使网络产生更有意义的结果。
{"title":"Melody-Conditioned Lyrics Generation with SeqGANs","authors":"Yihao Chen, Alexander Lerch","doi":"10.1109/ISM.2020.00040","DOIUrl":"https://doi.org/10.1109/ISM.2020.00040","url":null,"abstract":"Automatic lyrics generation has received attention from both music and AI communities for years. Early rule-based approaches have -due to increases in computational power and evolution in data-driven modelsmostly been replaced with deep-learning-based systems. Many existing approaches, however, either rely heavily on prior knowledge in music and lyrics writing or oversimplify the task by largely discarding melodic information and its relationship with the text. We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN), which generates a line of lyrics given the corresponding melody as the input. Furthermore, we investigate the performance of the generator with an additional input condition: the theme or overarching topic of the lyrics to be generated. We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125201940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2020 IEEE International Symposium on Multimedia (ISM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1