首页 > 最新文献

2015 IEEE International Symposium on Multimedia (ISM)最新文献

英文 中文
Modelling Video Rate Evolution in Adaptive Bitrate Selection 自适应比特率选择中的视频速率演化建模
Pub Date : 2015-12-14 DOI: 10.1109/ISM.2015.65
Yusuf Sani, A. Mauthe, C. Edwards
Adaptive bitrate selection adjusts the quality of HTTP streaming video to a changing context. A number of different schemes have been proposed that use buffer state in the selection of the appropriate video rate. However, models describing the relationship between video quality levels and buffer occupancy are mostly based on heuristics, which often results in unstable and/or suboptimal quality. In this paper, we present a QoE-aware video rate evolution model based on buffer state changes. The scheme is evaluated within a real world Internet environment, where it is shown to improve the stability of the video rate. Up to 27% gain in average video rate can be achieved compared to the baseline ABR. The average throughput utilisation at a steady-state reaches 100% in some of the investigated scenarios.
自适应比特率选择调整HTTP流视频的质量以适应不断变化的上下文。已经提出了许多不同的方案,使用缓冲状态来选择合适的视频速率。然而,描述视频质量水平和缓冲区占用之间关系的模型大多是基于启发式的,这通常会导致不稳定和/或次优质量。本文提出了一种基于缓存状态变化的qos感知视频速率演化模型。在真实的互联网环境中对该方案进行了评估,结果表明该方案提高了视频速率的稳定性。与基准ABR相比,平均视频速率可获得高达27%的增益。在所调查的一些场景中,稳定状态下的平均吞吐量利用率达到100%。
{"title":"Modelling Video Rate Evolution in Adaptive Bitrate Selection","authors":"Yusuf Sani, A. Mauthe, C. Edwards","doi":"10.1109/ISM.2015.65","DOIUrl":"https://doi.org/10.1109/ISM.2015.65","url":null,"abstract":"Adaptive bitrate selection adjusts the quality of HTTP streaming video to a changing context. A number of different schemes have been proposed that use buffer state in the selection of the appropriate video rate. However, models describing the relationship between video quality levels and buffer occupancy are mostly based on heuristics, which often results in unstable and/or suboptimal quality. In this paper, we present a QoE-aware video rate evolution model based on buffer state changes. The scheme is evaluated within a real world Internet environment, where it is shown to improve the stability of the video rate. Up to 27% gain in average video rate can be achieved compared to the baseline ABR. The average throughput utilisation at a steady-state reaches 100% in some of the investigated scenarios.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134564965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Characterization of the HEVC Coding Efficiency Advance Using 20 Scenes, ITU-T Rec. P.913 Compliant Subjective Methods, VQM, and PSNR 使用20个场景、ITU-T Rec. P.913兼容主观方法、VQM和PSNR表征HEVC编码效率进步
Pub Date : 2015-12-14 DOI: 10.1109/ISM.2015.38
Andrew Catellier, M. Pinson
The new video coding standard, MPEG-H Part 2 High Efficiency Video Coding (HEVC) or H.265, was developed to be roughly twice as efficient as H.264/AVC -- meaning H.265/HEVC could deliver the same quality as H.264/AVC using roughly half the bitrate. In this paper we describe a subjective experiment designed to test this claim. We present an experiment using 20 different 1080p 29.97 fps scenes and 12 impairment levels spanning MPEG-2, H.264/AVC and H.265/HEVC. Additionally we compare the results obtained from the subjective assessment to quality estimates from two objective metrics: VQM and PSNR. Our subjective results show that H.265/HEVC can deliver the same quality at half the bitrate compared to H.264/AVC and can perform better at one quarter the bitrate compared to MPEG-2 in many, but not all, situations. For all 20 scenes coded with H.265/HEVC at 4 Mbps mean opinion scores span 38% of the subjective scale, which indicates the importance of scene selection. Objective quality estimations of HEVC have a low correlation with subjective results (0.60 for VQM, 0.64 for PSNR).
新的视频编码标准MPEG-H Part 2 High Efficiency video coding (HEVC)或H.265的效率大约是H.264/AVC的两倍,这意味着H.265/HEVC可以以大约一半的比特率提供与H.264/AVC相同的质量。在本文中,我们描述了一个主观实验,旨在验证这一说法。我们提出了一个实验,使用20个不同的1080p 29.97 fps场景和12个MPEG-2, H.264/AVC和H.265/HEVC的损伤级别。此外,我们比较了从主观评估中获得的结果与两个客观指标的质量估计:VQM和PSNR。我们的主观结果表明,H.265/HEVC可以以H.264/AVC一半的比特率提供相同的质量,并且在许多情况下(但不是所有情况)可以以比MPEG-2四分之一的比特率表现得更好。对于所有用H.265/HEVC以4 Mbps编码的20个场景,平均意见得分超过主观尺度的38%,这表明场景选择的重要性。HEVC的客观质量评价与主观结果的相关性较低(VQM为0.60,PSNR为0.64)。
{"title":"Characterization of the HEVC Coding Efficiency Advance Using 20 Scenes, ITU-T Rec. P.913 Compliant Subjective Methods, VQM, and PSNR","authors":"Andrew Catellier, M. Pinson","doi":"10.1109/ISM.2015.38","DOIUrl":"https://doi.org/10.1109/ISM.2015.38","url":null,"abstract":"The new video coding standard, MPEG-H Part 2 High Efficiency Video Coding (HEVC) or H.265, was developed to be roughly twice as efficient as H.264/AVC -- meaning H.265/HEVC could deliver the same quality as H.264/AVC using roughly half the bitrate. In this paper we describe a subjective experiment designed to test this claim. We present an experiment using 20 different 1080p 29.97 fps scenes and 12 impairment levels spanning MPEG-2, H.264/AVC and H.265/HEVC. Additionally we compare the results obtained from the subjective assessment to quality estimates from two objective metrics: VQM and PSNR. Our subjective results show that H.265/HEVC can deliver the same quality at half the bitrate compared to H.264/AVC and can perform better at one quarter the bitrate compared to MPEG-2 in many, but not all, situations. For all 20 scenes coded with H.265/HEVC at 4 Mbps mean opinion scores span 38% of the subjective scale, which indicates the importance of scene selection. Objective quality estimations of HEVC have a low correlation with subjective results (0.60 for VQM, 0.64 for PSNR).","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121087317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Calculating a Minimum Playable Duration for HTTP Streaming Media Segments 计算HTTP流媒体段的最小可播放时长
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.39
M. Thornburgh
In streaming media over the Internet using HTTP, it is often desirable to minimize the duration of the media segments of a stream (for example, to reduce camera-to-viewer delay in streaming a live sports event). The operational behavior of Internet transport protocols limits the minimum duration of segments below which the segments can't be transferred in time to avoid a playback stall. This paper proposes a method to calculate the minimum duration for the segments of a media stream under given network transport conditions to avoid a playback stall. This method can be used to select the segment duration for a media stream under anticipated worst-case network conditions.
在使用HTTP的互联网流媒体中,通常希望最小化流的媒体段的持续时间(例如,在直播体育赛事的流媒体中减少摄像机到观众的延迟)。Internet传输协议的操作行为限制了段的最短持续时间,低于此时间段不能及时传输以避免播放中断。本文提出了一种在给定网络传输条件下计算媒体流片段的最小持续时间的方法,以避免播放中断。该方法可用于在预期的最坏网络条件下选择媒体流的段持续时间。
{"title":"Calculating a Minimum Playable Duration for HTTP Streaming Media Segments","authors":"M. Thornburgh","doi":"10.1109/ISM.2015.39","DOIUrl":"https://doi.org/10.1109/ISM.2015.39","url":null,"abstract":"In streaming media over the Internet using HTTP, it is often desirable to minimize the duration of the media segments of a stream (for example, to reduce camera-to-viewer delay in streaming a live sports event). The operational behavior of Internet transport protocols limits the minimum duration of segments below which the segments can't be transferred in time to avoid a playback stall. This paper proposes a method to calculate the minimum duration for the segments of a media stream under given network transport conditions to avoid a playback stall. This method can be used to select the segment duration for a media stream under anticipated worst-case network conditions.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115770728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning for Imbalanced Multimedia Data Classification 面向不平衡多媒体数据分类的深度学习
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.126
Yilin Yan, Min Chen, M. Shyu, Shu‐Ching Chen
Classification of imbalanced data is an important research problem as lots of real-world data sets have skewed class distributions in which the majority of data instances (examples) belong to one class and far fewer instances belong to others. While in many applications, the minority instances actually represent the concept of interest (e.g., fraud in banking operations, abnormal cell in medical data, etc.), a classifier induced from an imbalanced data set is more likely to be biased towards the majority class and show very poor classification accuracy on the minority class. Despite extensive research efforts, imbalanced data classification remains one of the most challenging problems in data mining and machine learning, especially for multimedia data. To tackle this challenge, in this paper, we propose an extended deep learning approach to achieve promising performance in classifying skewed multimedia data sets. Specifically, we investigate the integration of bootstrapping methods and a state-of-the-art deep learning approach, Convolutional Neural Networks (CNNs), with extensive empirical studies. Considering the fact that deep learning approaches such as CNNs are usually computationally expensive, we propose to feed low-level features to CNNs and prove its feasibility in achieving promising performance while saving a lot of training time. The experimental results show the effectiveness of our framework in classifying severely imbalanced data in the TRECVID data set.
不平衡数据的分类是一个重要的研究问题,因为许多现实世界的数据集具有倾斜的类分布,其中大多数数据实例(示例)属于一个类,而属于其他类的实例要少得多。虽然在许多应用中,少数实例实际上代表了利益的概念(例如,银行业务中的欺诈,医疗数据中的异常细胞等),但从不平衡数据集诱导的分类器更有可能偏向多数类,并且在少数类上显示出非常差的分类精度。尽管进行了大量的研究,但不平衡数据分类仍然是数据挖掘和机器学习中最具挑战性的问题之一,特别是对于多媒体数据。为了应对这一挑战,在本文中,我们提出了一种扩展的深度学习方法,以实现对倾斜多媒体数据集分类的良好性能。具体来说,我们研究了自举方法和最先进的深度学习方法卷积神经网络(cnn)的集成,并进行了广泛的实证研究。考虑到cnn等深度学习方法通常计算成本很高,我们提出向cnn提供低级特征,并证明其在节省大量训练时间的同时取得良好性能的可行性。实验结果表明了该框架对TRECVID数据集中严重不平衡数据进行分类的有效性。
{"title":"Deep Learning for Imbalanced Multimedia Data Classification","authors":"Yilin Yan, Min Chen, M. Shyu, Shu‐Ching Chen","doi":"10.1109/ISM.2015.126","DOIUrl":"https://doi.org/10.1109/ISM.2015.126","url":null,"abstract":"Classification of imbalanced data is an important research problem as lots of real-world data sets have skewed class distributions in which the majority of data instances (examples) belong to one class and far fewer instances belong to others. While in many applications, the minority instances actually represent the concept of interest (e.g., fraud in banking operations, abnormal cell in medical data, etc.), a classifier induced from an imbalanced data set is more likely to be biased towards the majority class and show very poor classification accuracy on the minority class. Despite extensive research efforts, imbalanced data classification remains one of the most challenging problems in data mining and machine learning, especially for multimedia data. To tackle this challenge, in this paper, we propose an extended deep learning approach to achieve promising performance in classifying skewed multimedia data sets. Specifically, we investigate the integration of bootstrapping methods and a state-of-the-art deep learning approach, Convolutional Neural Networks (CNNs), with extensive empirical studies. Considering the fact that deep learning approaches such as CNNs are usually computationally expensive, we propose to feed low-level features to CNNs and prove its feasibility in achieving promising performance while saving a lot of training time. The experimental results show the effectiveness of our framework in classifying severely imbalanced data in the TRECVID data set.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116711842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 129
Run-Time Machine Learning for HEVC/H.265 Fast Partitioning Decision 运行时机器学习HEVC/H.265快速分区决策
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.70
S. Momcilovic, N. Roma, L. Sousa, I. Milentijevic
A novel fast Coding Tree Unit partitioning for HEVC/H.265 encoder is proposed in this paper. This method relies on run-time trained neural networks for fast Coding Units splitting decisions. Contrasting to state-of-the-art solutions, this method does not require any pre-training and provides a high adaptivity to the dynamic changes in video contents. By an efficient sampling strategy and a multi-thread implementation, the presented technique successfully mitigates the computational overhead inherent to the training process on both the overall processing performance and on the initial encoding delay. The experiments show that the proposed method successfully reduces the HEVC/H.265 encoding time for up to 65% with negligible rate-distortion penalties.
一种新的HEVC/H编码树单元快速划分方法。本文提出了265编码器。该方法依靠运行时训练的神经网络实现快速的编码单元分割决策。与最先进的解决方案相比,该方法不需要任何预训练,并且对视频内容的动态变化具有很高的适应性。通过有效的采样策略和多线程实现,该技术成功地降低了训练过程中对整体处理性能和初始编码延迟的固有计算开销。实验表明,该方法有效地降低了HEVC/H。265编码时间高达65%与可忽略不计的率失真处罚。
{"title":"Run-Time Machine Learning for HEVC/H.265 Fast Partitioning Decision","authors":"S. Momcilovic, N. Roma, L. Sousa, I. Milentijevic","doi":"10.1109/ISM.2015.70","DOIUrl":"https://doi.org/10.1109/ISM.2015.70","url":null,"abstract":"A novel fast Coding Tree Unit partitioning for HEVC/H.265 encoder is proposed in this paper. This method relies on run-time trained neural networks for fast Coding Units splitting decisions. Contrasting to state-of-the-art solutions, this method does not require any pre-training and provides a high adaptivity to the dynamic changes in video contents. By an efficient sampling strategy and a multi-thread implementation, the presented technique successfully mitigates the computational overhead inherent to the training process on both the overall processing performance and on the initial encoding delay. The experiments show that the proposed method successfully reduces the HEVC/H.265 encoding time for up to 65% with negligible rate-distortion penalties.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122766020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Scrubbing Wheel: An Interaction Concept to Improve Video Content Navigation on Devices with Touchscreens 滚轮:一种改进触摸屏设备视频内容导航的交互概念
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.20
Klaus Schöffmann, Lukas Burgstaller
We propose a new interface that facilitates content navigation in videos on devices with touchscreen interaction. This interface allows both coarse-grained and fine-grained navigation in an intuitive way and enables better performance when used to locate specific scenes in videos. We implemented this interface on a 5.5-inch smartphone and tested it with 24 users. Our results show that for video navigation tasks the proposed interface significantly outperforms the seekerbar interface, commonly used with video players on mobile devices. Moreover, we found that the interaction concept of the Scrubbing Wheel has a much lower perceived workload than the widely used seeker-bar, and is the preferred tool to locate scenes in videos for all tested users in our study.
我们提出了一个新的界面,便于在触摸屏交互设备上的视频内容导航。该界面既支持粗粒度导航,也支持细粒度导航,直观,在视频中定位特定场景时性能更好。我们在一部5.5英寸的智能手机上实现了这个界面,并对24名用户进行了测试。我们的研究结果表明,对于视频导航任务,所提出的界面明显优于搜索栏界面,通常用于移动设备上的视频播放器。此外,我们发现,与广泛使用的搜索栏相比,洗涤轮的交互概念具有更低的感知工作量,并且是我们研究中所有被测试用户在视频中定位场景的首选工具。
{"title":"Scrubbing Wheel: An Interaction Concept to Improve Video Content Navigation on Devices with Touchscreens","authors":"Klaus Schöffmann, Lukas Burgstaller","doi":"10.1109/ISM.2015.20","DOIUrl":"https://doi.org/10.1109/ISM.2015.20","url":null,"abstract":"We propose a new interface that facilitates content navigation in videos on devices with touchscreen interaction. This interface allows both coarse-grained and fine-grained navigation in an intuitive way and enables better performance when used to locate specific scenes in videos. We implemented this interface on a 5.5-inch smartphone and tested it with 24 users. Our results show that for video navigation tasks the proposed interface significantly outperforms the seekerbar interface, commonly used with video players on mobile devices. Moreover, we found that the interaction concept of the Scrubbing Wheel has a much lower perceived workload than the widely used seeker-bar, and is the preferred tool to locate scenes in videos for all tested users in our study.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129725257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analysis and Transcoding Time Prediction of Online Videos 在线视频的转码时间预测与分析
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.100
Tewodros Deneke, S. Lafond, J. Lilius
Today, video content is delivered to a myriad of devices over different communication networks. Video delivery must be adapted to the available bandwidth, screen size, resolution and the decoding capability of the end user devices. In this work we present an approach to predict the transcoding time of a video into another given transcoding parameters and an input video. To obtain enough information on the characteristics of real world online videos and their transcoding parameters needed to model transcoding time, we built a video characteristics dataset, using data collected from a large video-on-demand system, YouTube. The dataset contains a million randomly sampled video instances listing 10 fundamental video characteristics. We report our analysis on the dataset which provides insightful statistics on fundamental online video characteristics that can be further exploited to optimize or model components of a multimedia processing systems. We also present experimental results on transcoding time prediction models, based on support vector machines, linear regression and multi-layer perceptron feed forward artificial neural network.
今天,视频内容通过不同的通信网络传送到无数的设备上。视频传输必须适应终端用户设备的可用带宽、屏幕尺寸、分辨率和解码能力。在这项工作中,我们提出了一种方法来预测视频到另一个给定的转码参数和输入视频的转码时间。为了获得足够的关于真实世界在线视频特征及其转码时间模型所需的转码参数的信息,我们建立了一个视频特征数据集,使用从大型视频点播系统YouTube收集的数据。该数据集包含一百万个随机采样的视频实例,列出了10个基本视频特征。我们报告了我们对数据集的分析,该数据集提供了关于基本在线视频特征的有见地的统计数据,可以进一步利用这些数据集来优化或建模多媒体处理系统的组件。本文还介绍了基于支持向量机、线性回归和多层感知机前馈人工神经网络的转码时间预测模型的实验结果。
{"title":"Analysis and Transcoding Time Prediction of Online Videos","authors":"Tewodros Deneke, S. Lafond, J. Lilius","doi":"10.1109/ISM.2015.100","DOIUrl":"https://doi.org/10.1109/ISM.2015.100","url":null,"abstract":"Today, video content is delivered to a myriad of devices over different communication networks. Video delivery must be adapted to the available bandwidth, screen size, resolution and the decoding capability of the end user devices. In this work we present an approach to predict the transcoding time of a video into another given transcoding parameters and an input video. To obtain enough information on the characteristics of real world online videos and their transcoding parameters needed to model transcoding time, we built a video characteristics dataset, using data collected from a large video-on-demand system, YouTube. The dataset contains a million randomly sampled video instances listing 10 fundamental video characteristics. We report our analysis on the dataset which provides insightful statistics on fundamental online video characteristics that can be further exploited to optimize or model components of a multimedia processing systems. We also present experimental results on transcoding time prediction models, based on support vector machines, linear regression and multi-layer perceptron feed forward artificial neural network.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126232280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Synthetic Voice Harmonization: A Fast and Precise Method 合成声音协调:一种快速而精确的方法
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.122
Juan Gremes, Nicola Palavecino, Lucas Seeber, Santiago Herrero
Musicians often lack the ability to harmonize their voices within a track. To help with this matter, a tool can be developed for detecting the scale or key in which a track is sung and synthesizing pitches to make a triadchord or a tetrachord (combinations of three or four notes that fit in the scale's harmony) for each corresponding tone in the melody. In this paper, we present a fast and precise method to detect the pitch of voice and shift it to the appropriate frequencies, consequently building up a harmony out of the original melody. Four techniques are involved in this sequential process: segmentation into consonant and vowel intervals, pitch detection by the McLeod Pitch Method (MPM), functional harmony for establishing a cadence, and pitch shifting by means of a phase vocoder.
音乐家常常缺乏使他们的声音在一首乐曲中协调一致的能力。为了解决这个问题,可以开发一种工具,用于检测音轨的音阶或音键,并为旋律中每个相应的音调合成音高,以形成三和弦或四和弦(符合音阶和声的三个或四个音符的组合)。本文提出了一种快速精确的方法来检测声音的音高,并将其转移到合适的频率,从而在原旋律之外建立和声。在这个连续的过程中涉及四种技术:分割成辅音和元音间隔,通过麦克劳德音高方法(MPM)检测音高,建立节奏的功能和声,以及通过相位声码器进行音高转换。
{"title":"Synthetic Voice Harmonization: A Fast and Precise Method","authors":"Juan Gremes, Nicola Palavecino, Lucas Seeber, Santiago Herrero","doi":"10.1109/ISM.2015.122","DOIUrl":"https://doi.org/10.1109/ISM.2015.122","url":null,"abstract":"Musicians often lack the ability to harmonize their voices within a track. To help with this matter, a tool can be developed for detecting the scale or key in which a track is sung and synthesizing pitches to make a triadchord or a tetrachord (combinations of three or four notes that fit in the scale's harmony) for each corresponding tone in the melody. In this paper, we present a fast and precise method to detect the pitch of voice and shift it to the appropriate frequencies, consequently building up a harmony out of the original melody. Four techniques are involved in this sequential process: segmentation into consonant and vowel intervals, pitch detection by the McLeod Pitch Method (MPM), functional harmony for establishing a cadence, and pitch shifting by means of a phase vocoder.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126439324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Content Curation System for Multiple Live Sport Video Streams 多个直播体育视频流的自动内容管理系统
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.17
Kazuki Fujisawa, Yuko Hirabe, H. Suwa, Yutaka Arakawa, K. Yasumoto
In this paper, we aim to develop a method to create personalized and high-presence multi-channel contents for a sport game through realtime content curation from various media streams captured/created by spectators. We use the live TV broadcast as a ground truth data and construct a machine learning-based model to automatically conduct curation from multiple videos which spectators captured from different angles and zoom levels. The live TV broadcast of a baseball game has some curation rules which select a specific angle camera for some specific scenes (e.g., a pitcher throwing a ball). As inputs for constructing a model, we use meta data such as image feature data (e.g., a pitcher is on the screen) in each fixed interval of baseball videos and game progress data (e.g., the inning number and the batting order). Output is the camera ID (among multiple cameras of spectators) at each point of time. For evaluation, we targeted Spring-Selection high-school baseball games. As training data, we used image features, game progress data, and the camera position at each point of time in the TV broadcast. We used videos of a baseball game captured from 7 different points in Hanshin Koshien Stadium with handy video cameras and generated sample data set by dividing the videos to fixed interval segments. We divided the sample data set into the training data set and the test data set and evaluated our method through two validation methods: (1) 10-fold crossvalidation method and (2) hold-out methods (e.g., learning first and second innings and testing third inning). As a result, our method predicted the camera switching timings with accuracy (F-measure) of 72.53% on weighted average for the base camera work and 92.1% for the fixed camera work.
在本文中,我们的目标是开发一种方法,通过从观众捕获/创建的各种媒体流中进行实时内容策划,为体育比赛创建个性化和高存在度的多渠道内容。我们将电视直播作为地面真实数据,构建基于机器学习的模型,对观众从不同角度和缩放级别拍摄的多个视频进行自动策展。棒球比赛的电视直播有一些策展规则,会为某些特定场景选择特定角度的摄像机(例如投手投球)。作为构建模型的输入,我们在棒球视频的每个固定间隔中使用图像特征数据(例如,屏幕上有投手)和比赛进度数据(例如,局数和击球顺序)等元数据。输出是每个时间点的摄像机ID(在观众的多个摄像机中)。为了进行评估,我们以春季选拔高中棒球比赛为目标。作为训练数据,我们使用图像特征、比赛进程数据和电视转播中每个时间点的摄像机位置。我们使用便携式摄像机从Hanshin Koshien体育场的7个不同地点拍摄的棒球比赛视频,并通过将视频划分为固定的间隔片段来生成样本数据集。我们将样本数据集分为训练数据集和测试数据集,并通过两种验证方法对我们的方法进行了评估:(1)10倍交叉验证法和(2)保留方法(例如学习第一局和第二局,测试第三局)。结果表明,该方法预测摄像机切换时间的加权平均精度(F-measure)为基础摄像机工作的72.53%,固定摄像机工作的92.1%。
{"title":"Automatic Content Curation System for Multiple Live Sport Video Streams","authors":"Kazuki Fujisawa, Yuko Hirabe, H. Suwa, Yutaka Arakawa, K. Yasumoto","doi":"10.1109/ISM.2015.17","DOIUrl":"https://doi.org/10.1109/ISM.2015.17","url":null,"abstract":"In this paper, we aim to develop a method to create personalized and high-presence multi-channel contents for a sport game through realtime content curation from various media streams captured/created by spectators. We use the live TV broadcast as a ground truth data and construct a machine learning-based model to automatically conduct curation from multiple videos which spectators captured from different angles and zoom levels. The live TV broadcast of a baseball game has some curation rules which select a specific angle camera for some specific scenes (e.g., a pitcher throwing a ball). As inputs for constructing a model, we use meta data such as image feature data (e.g., a pitcher is on the screen) in each fixed interval of baseball videos and game progress data (e.g., the inning number and the batting order). Output is the camera ID (among multiple cameras of spectators) at each point of time. For evaluation, we targeted Spring-Selection high-school baseball games. As training data, we used image features, game progress data, and the camera position at each point of time in the TV broadcast. We used videos of a baseball game captured from 7 different points in Hanshin Koshien Stadium with handy video cameras and generated sample data set by dividing the videos to fixed interval segments. We divided the sample data set into the training data set and the test data set and evaluated our method through two validation methods: (1) 10-fold crossvalidation method and (2) hold-out methods (e.g., learning first and second innings and testing third inning). As a result, our method predicted the camera switching timings with accuracy (F-measure) of 72.53% on weighted average for the base camera work and 92.1% for the fixed camera work.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"436 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115808721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A New Glowworm Swarm Optimization Based Clustering Algorithm for Multimedia Documents 基于萤火虫群优化的多媒体文档聚类新算法
Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.94
K. Pushpalatha, S. AnanthanarayanaV.
Due to the explosion of multimedia data, the demand for the sophisticated multimedia knowledge discovery systems has been increased. The multimodal nature of multimedia data is the big barrier for knowledge extraction. The representation of multimodal data in a unimodal space will be more advantageous for any mining task. We initially represent the multimodal multimedia documents in a unimodal space by converting the multimedia objects into signal objects. The dynamic nature of the glowworms motivated us to propose the Glowworm Swarm Optimization based Multimedia Document Clustering (GSOMDC) algorithm to group the multimedia documents into topics. The better purity and entropy values indicates that the GSOMDC algorithm successfully clusters the multimedia documents into topics. The goodness of the clustering is evaluated by performing the cluster based retrieval of multimedia documents with better precision values.
由于多媒体数据的爆炸式增长,对复杂的多媒体知识发现系统的需求不断增加。多媒体数据的多模态特性是知识抽取的一大障碍。在单模态空间中表示多模态数据对任何挖掘任务都更有利。我们首先通过将多媒体对象转换为信号对象来表示单峰空间中的多模态多媒体文档。基于萤火虫的动态特性,我们提出了基于萤火虫群优化的多媒体文档聚类算法(GSOMDC),对多媒体文档进行主题分组。较好的纯度和熵值表明GSOMDC算法成功地将多媒体文档聚类成主题。通过对具有较好精度值的多媒体文档进行基于聚类的检索来评估聚类的优劣。
{"title":"A New Glowworm Swarm Optimization Based Clustering Algorithm for Multimedia Documents","authors":"K. Pushpalatha, S. AnanthanarayanaV.","doi":"10.1109/ISM.2015.94","DOIUrl":"https://doi.org/10.1109/ISM.2015.94","url":null,"abstract":"Due to the explosion of multimedia data, the demand for the sophisticated multimedia knowledge discovery systems has been increased. The multimodal nature of multimedia data is the big barrier for knowledge extraction. The representation of multimodal data in a unimodal space will be more advantageous for any mining task. We initially represent the multimodal multimedia documents in a unimodal space by converting the multimedia objects into signal objects. The dynamic nature of the glowworms motivated us to propose the Glowworm Swarm Optimization based Multimedia Document Clustering (GSOMDC) algorithm to group the multimedia documents into topics. The better purity and entropy values indicates that the GSOMDC algorithm successfully clusters the multimedia documents into topics. The goodness of the clustering is evaluated by performing the cluster based retrieval of multimedia documents with better precision values.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134028245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2015 IEEE International Symposium on Multimedia (ISM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1