2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)最新文献

英文中文

Pitch Marking Using the Fundamental Signal for Speech Modifications via TDPSOLA 基于TDPSOLA的语音修改基本信号的基音标记

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.28

F. Ykhlef, L. Bendaouia

The quality of synthetic speech offered by pitch and duration modifications via Time Domain Pitch Synchronous Overlap Add method (TD-PSOLA) relies on an accurate positioning of pitch marks. In this paper, we propose a new pitch marking technique of voiced regions based on the fundamental signal of the speech waveform. By using the valleys of the fundamental signal, we locate a set of precise intervals where the exact instants of pitch marks are expected to be found. The fundamental signal is composed only from the fundamental frequency (pitch) of speech. It is represented by a specific signal named "mean based signal" (MBS). The optimal pitch marks are found by extracting the set of global peak instants within the obtained intervals. To improve the performance of the proposed technique, we have proposed a post processing stage which allows us to correct the erroneous pitch marks that may occur due to some synchronization problems. The proposed technique is evaluated on CMU ACRTIC database by using objective and subjective measures. The experiments demonstrate that the proposed technique allows pitch and duration modifications via TD-PSOLA with high quality.

通过时域基音同步重叠添加方法(TD-PSOLA)修改基音和持续时间所提供的合成语音质量依赖于基音标记的精确定位。本文提出了一种基于语音波形基本信号的浊音区基音标记方法。通过使用基波信号的谷值，我们找到了一组精确的间隔，在这些间隔中，我们期望找到音高标记的精确瞬间。基本信号仅由语音的基本频率(音高)组成。它由一个特定的信号表示，称为“基于均值的信号”(MBS)。通过提取得到的区间内的全局峰值瞬间集来找到最优的音高标记。为了提高所提出的技术的性能，我们提出了一个后处理阶段，它允许我们纠正由于一些同步问题可能出现的错误音高标记。在CMU ACRTIC数据库上对该技术进行了客观和主观评价。实验表明，该技术可以通过TD-PSOLA高质量地修改音高和音长。

{"title":"Pitch Marking Using the Fundamental Signal for Speech Modifications via TDPSOLA","authors":"F. Ykhlef, L. Bendaouia","doi":"10.1109/ISM.2013.28","DOIUrl":"https://doi.org/10.1109/ISM.2013.28","url":null,"abstract":"The quality of synthetic speech offered by pitch and duration modifications via Time Domain Pitch Synchronous Overlap Add method (TD-PSOLA) relies on an accurate positioning of pitch marks. In this paper, we propose a new pitch marking technique of voiced regions based on the fundamental signal of the speech waveform. By using the valleys of the fundamental signal, we locate a set of precise intervals where the exact instants of pitch marks are expected to be found. The fundamental signal is composed only from the fundamental frequency (pitch) of speech. It is represented by a specific signal named \"mean based signal\" (MBS). The optimal pitch marks are found by extracting the set of global peak instants within the obtained intervals. To improve the performance of the proposed technique, we have proposed a post processing stage which allows us to correct the erroneous pitch marks that may occur due to some synchronization problems. The proposed technique is evaluated on CMU ACRTIC database by using objective and subjective measures. The experiments demonstrate that the proposed technique allows pitch and duration modifications via TD-PSOLA with high quality.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"57 1","pages":"118-124"},"PeriodicalIF":0.0,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80227193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An Improvement in Media Discovery Service Using Name Spotting 使用名称定位的媒体发现服务的改进

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.83

Manish Goswami, Lan Yang

Digital Object Repository in the Digital Object Architecture stores a large number of audio/video media files. Lack of metadata in audio/video media files limits the media discovery service in Digital Object Architecture from searching those media files. In this paper we designed a system that uses name spotting module to extract the names, stores the extracted names with audio/video media files, simulates the media discovery service and reports the findings related to the improvement in searching the media file.

数字对象架构中的数字对象存储库存储了大量的音频/视频媒体文件。音频/视频媒体文件中元数据的缺乏限制了数字对象体系结构中的媒体发现服务对这些媒体文件的搜索。在本文中，我们设计了一个系统，该系统使用名字识别模块提取名字，将提取的名字存储在音频/视频媒体文件中，模拟媒体发现服务，并报告与搜索媒体文件相关的改进结果。

引用次数: 0

Predicting Key Recognition Difficulty in Polyphonic Audio 预测复调音频的键识别困难

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.82

C. Chuan, Aleksey Charapko

In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.

在本文中，我们提出了统计模型来预测从复调音频信号中识别音乐键的难度。音频键的自动查找已经进行了多年的研究，提出并报道了各种方法。这些方法的性能报告通常是基于提出者自己的数据集。如果没有数据集的细节，即数据集的挑战性如何，直接比较这些方法的有效性是没有意义的，甚至是不可能的。因此，在本研究中，我们专注于预测人类专家感知到的关键识别的难度水平。给定一个录音，表示为提取的声学特征，我们应用多元线性回归和比例几率模型来预测录音的难度水平，专家在5点李克特量表上以整数形式注释。我们使用四个指标来评估我们的预测结果:均方根误差、Pearson相关系数、精确精度和相邻精度。我们还检查了专家注释之间的差异，并讨论了它们的一致性。

引用次数: 1

Accurate Detection of Moving Objects in Traffic Video Streams over Limited Bandwidth Networks 有限带宽网络下交通视频流中运动目标的精确检测

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.20

Bo-Hao Chen, Shih-Chia Huang

Automated detection of moving objects is an essential task for any intelligent transportation system. However, conventional motion detection techniques often suffer from the loss of moving objects due to bit-rate variation in video streams transmitted via wireless video communication systems. To achieve motion detection that is both reliable and accurate in video streams of variable bit-rate, this paper proposes a novel motion detection approach which is based on grey relational analysis, and which integrates a multi-quality background generation module and a moving object detection module. As our experimental results demonstrate, the proposed approach attained superior motion detection performance compared to other state-of-the-art techniques based on qualitative and quantitative evaluations. Quantitative evaluations produced F1 and Similarity accuracy scores for the proposed approach that were up to 59.96% and 55.42% higher than those of the other compared techniques, respectively.

自动检测移动物体是任何智能交通系统的基本任务。然而，由于无线视频通信系统传输的视频流中的比特率变化，传统的运动检测技术经常遭受运动物体丢失的困扰。为了在可变比特率视频流中实现可靠而准确的运动检测，本文提出了一种基于灰色关联分析的运动检测方法，该方法集成了多质量背景生成模块和运动目标检测模块。正如我们的实验结果所表明的，与基于定性和定量评估的其他最先进技术相比，所提出的方法获得了优越的运动检测性能。定量评价结果表明，该方法的F1和Similarity准确率分别比其他方法高59.96%和55.42%。

引用次数: 2

Nested Event Model for Multimedia Narratives 多媒体叙事的嵌套事件模型

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.26

Ricardo Rios M. do Carmo, L. Soares, M. Casanova

The proliferation of multimedia narratives has contributed to what is known as the "crisis of choice", which demands a much more active participation on the part of the user to consume multimedia content. To address this issue, a strategy is to offer users efficient search mechanisms, sometimes based on ontologies. However, one may argue that such mechanisms are often based on abstractions that do not adequately capture the essential aspects of multimedia narratives. This paper proposes a conceptual model to specify multimedia narratives that overcomes this limitation. The model is based on the notion of event and is therefore called Nested Event Model (NEMo). The paper also includes a complete example to illustrate the use of the model.

多媒体叙事的激增导致了所谓的“选择危机”，这要求用户更积极地参与消费多媒体内容。为了解决这个问题，一种策略是为用户提供有效的搜索机制，有时是基于本体的。然而，有人可能会争辩说，这种机制通常是基于抽象的，不能充分捕捉多媒体叙述的基本方面。本文提出了一个概念性模型来说明克服这一限制的多媒体叙事。该模型基于事件的概念，因此称为嵌套事件模型(NEMo)。本文还包括一个完整的例子来说明该模型的使用。

引用次数: 2

Improving Computational Efficiency of 3D Point Cloud Reconstruction from Image Sequences 提高图像序列重建三维点云的计算效率

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.101

Chih-Hsiang Chang, N. Kehtarnavaz

The Levenberg-Marquardt optimization is normally used in 3D point cloud reconstruction from image sequences which is computationally expensive. This paper presents a two-stage camera pose estimation approach where an initial camera pose is obtained during the first stage and a refinement is performed during the second stage. This approach does not require the use of the Levenberg-Marquardt optimization and LU matrix decomposition for computing the projection matrix, thus providing a more computationally efficient 3D point cloud reconstruction as compared to the existing approaches. The results obtained using real video sequences indicate that the introduced approach generates lower re-projection errors as well as faster 3D point cloud reconstruction.

Levenberg-Marquardt优化通常用于从图像序列中重建三维点云，其计算成本很高。本文提出了一种两阶段相机姿态估计方法，在第一阶段获得初始相机姿态，在第二阶段进行细化。该方法不需要使用Levenberg-Marquardt优化和LU矩阵分解来计算投影矩阵，因此与现有方法相比，可以提供更高效的三维点云重建。实验结果表明，该方法具有较低的重投影误差和较快的三维点云重建速度。

引用次数: 1

A Cross-Stack Predictive Control Framework for Multimedia Applications 多媒体应用的跨栈预测控制框架

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.77

Guangyi Cao, A. Ravindran, S. Kamalasadan, B. Joshi, A. Mukherjee

We demonstrate a novel cross-stack control theoretic approach in designing a predictive controller that can automatically track changes in the multimedia workload to maintain a desired metric of application quality while minimizing power consumption.

我们展示了一种新的交叉堆栈控制理论方法来设计一种预测控制器，该控制器可以自动跟踪多媒体工作负载的变化，以保持所需的应用质量度量，同时最小化功耗。

引用次数: 4

Low Complexity Video Encoding and High Complexity Decoding for UAV Reconnaissance and Surveillance 面向无人机侦察监视的低复杂度视频编码和高复杂度解码

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.34

Malavika Bhaskaranand, J. Gibson

Conventional video compression schemes such as H.264/AVC use a high complexity encoder with block motion estimation (ME) and a low complexity, low latency decoder. However, unmanned aerial vehicle (UAV) reconnaissance and surveillance applications require low complexity encoders but can accommodate high complexity decoders. Moreover, the video sequences in these applications often primarily have global motion due to the known movement of the UAV and camera mounts. Motivated by this scenario, we propose and investigate a low complexity encoder with global motion based frame prediction and no block ME. For fly-over videos, our encoder achieves more than a 40% bit rate savings over a H.264 encoder with ME block size restricted to 8 × 8 and at lower complexity. We also develop a high complexity decoder based on Kalman filtering along motion trajectories and show average PSNR improvements of up to 0.5 dB with respect to a classic low complexity decoder.

传统的视频压缩方案(如H.264/AVC)使用具有块运动估计(ME)的高复杂度编码器和低复杂度、低延迟的解码器。然而，无人机(UAV)侦察和监视应用需要低复杂度的编码器，但可以容纳高复杂度的解码器。此外，在这些应用中的视频序列通常主要具有全局运动，由于无人机和相机支架的已知运动。在这种情况下，我们提出并研究了一种基于全局运动的帧预测和无块ME的低复杂度编码器。对于飞越视频，我们的编码器实现了超过40%的比特率节省比H.264编码器与ME块大小限制为8 × 8和较低的复杂性。我们还开发了一种基于卡尔曼滤波沿运动轨迹的高复杂度解码器，与经典的低复杂度解码器相比，平均PSNR提高了0.5 dB。

引用次数: 8

Efficient Super Resolution Using Edge Directed Unsharp Masking Sharpening Method 有效的超分辨率使用边缘定向不锐利掩蔽锐化方法

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.100

Kuo-Shiuan Peng, F. Lin, Yi-Pai Huang, H. Shieh

This paper investigated the potential of the real-time implementation in single image super resolution using edge directed unsharp masking sharpening (EDUMS) method. To achieve efficient real-time implementation with unsharp masking sharpening, the resolution enhancement process needed only simply filtering operations without iterations. Also, with edge directed information as the prior of the unsharp masking sharpening method, the jaggy artifact was efficiently suppressed. Clear edge structures and vivid details of high resolution images with minimum artifacts were presented by the proposed method.

本文研究了利用边缘定向非锐利掩蔽锐化(EDUMS)方法实时实现单幅图像超分辨率的潜力。为了在不锐利的掩蔽锐化下实现高效的实时实现，分辨率增强过程只需要简单的滤波操作，而不需要迭代。同时，利用边缘定向信息作为非锐化掩膜锐化方法的先验，有效地抑制了锯齿状伪影。该方法具有边缘结构清晰、细节逼真、伪影最小的特点。

引用次数: 5

Recognition of Action in Broadcast Basketball Videos on the Basis of Global and Local Pairwise Representation 基于全局和局部成对表示的转播篮球视频动作识别

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

Pub Date : 2013-12-09 DOI: 10.1109/ISM.2013.32

Masaki Takahashi, M. Naemura, Mahito Fujii, J. Little

A new feature-representation method for recognizing actions in broadcast videos, which focuses on the relationship between human actions and camera motions, is proposed. With this method, key point trajectories are extracted as motion features in spatio-temporal sub-regions called "spatio-temporal multiscale bags" (STMBs). Global representations and local representations from one sub-region in the STMBs are then combined to create a "glocal pair wise representation" (GPR). The GPR considers the co-occurrence of camera motions and human actions. Finally, two-stage SVM classifiers are trained with STMB-based GPRs, and specified human actions in video sequences are identified. It was experimentally confirmed that the proposed method can robustly detect specific human actions in broadcast basketball videos.

提出了一种新的基于特征表示的视频动作识别方法，该方法关注人的动作与摄像机运动之间的关系。该方法将关键点轨迹提取为时空子区域的运动特征，称为“时空多尺度袋”(spatial -temporal multiscale bags, stmb)。然后将来自stmb中一个子区域的全局表示和本地表示结合起来创建“全局局部对表示”(GPR)。GPR考虑了相机运动和人类行为的共现性。最后，利用基于stmb的GPRs训练两阶段SVM分类器，识别视频序列中特定的人类动作。实验证明，该方法能够鲁棒地检测出篮球视频中特定的人体动作。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀