首页 > 最新文献

2005 IEEE International Conference on Multimedia and Expo最新文献

英文 中文
Speech-Based Visual Concept Learning Using Wordnet 基于语音的视觉概念学习
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521627
Xiaodan Song, Ching-Yung Lin, Ming-Ting Sun
Modeling visual concepts using supervised or unsupervised machine learning approaches are becoming increasing important for video semantic indexing, retrieval, and filtering applications. Naturally, videos include multimodality data such as audio, speech, visual and text, which are combined to infer therein the overall semantic concepts. However, in the literature, most researches were conducted within only one single domain. In this paper we propose an unsupervised technique that builds context-independent keyword lists for desired visual concept modeling using WordNet. Furthermore, we propose an extended speech-based visual concept (ESVC) model to reorder and extend the above keyword lists by supervised learning based on multimodality annotation. Experimental results show that the context-independent models can achieve comparable performance compared to conventional supervised learning algorithms, and the ESVC model achieves about 53% and 28.4% improvement in two testing subsets of the TRECVID 2003 corpus over a state-of-the-art speech-based video concept detection algorithm
使用监督或无监督机器学习方法对视觉概念建模对于视频语义索引、检索和过滤应用变得越来越重要。当然,视频包含音频、语音、视觉和文本等多模态数据,这些数据被组合在一起,从而推断出整体的语义概念。然而,在文献中,大多数研究只在一个单一的领域内进行。在本文中,我们提出了一种无监督技术,该技术构建上下文无关的关键字列表,用于使用WordNet进行所需的视觉概念建模。此外,我们提出了一个扩展的基于语音的视觉概念(ESVC)模型,通过基于多模态标注的监督学习对上述关键字列表进行重新排序和扩展。实验结果表明,与传统的监督学习算法相比,上下文无关的模型可以达到相当的性能,并且ESVC模型在TRECVID 2003语料库的两个测试子集上比最先进的基于语音的视频概念检测算法分别提高了53%和28.4%
{"title":"Speech-Based Visual Concept Learning Using Wordnet","authors":"Xiaodan Song, Ching-Yung Lin, Ming-Ting Sun","doi":"10.1109/ICME.2005.1521627","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521627","url":null,"abstract":"Modeling visual concepts using supervised or unsupervised machine learning approaches are becoming increasing important for video semantic indexing, retrieval, and filtering applications. Naturally, videos include multimodality data such as audio, speech, visual and text, which are combined to infer therein the overall semantic concepts. However, in the literature, most researches were conducted within only one single domain. In this paper we propose an unsupervised technique that builds context-independent keyword lists for desired visual concept modeling using WordNet. Furthermore, we propose an extended speech-based visual concept (ESVC) model to reorder and extend the above keyword lists by supervised learning based on multimodality annotation. Experimental results show that the context-independent models can achieve comparable performance compared to conventional supervised learning algorithms, and the ESVC model achieves about 53% and 28.4% improvement in two testing subsets of the TRECVID 2003 corpus over a state-of-the-art speech-based video concept detection algorithm","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132683726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reliable video communication with multi-path streaming using MDC 可靠的视频通信与多路径流使用MDC
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521522
I. Lee, L. Guan
Video streaming demands high data rates and hard delay constraints, and it raises several challenges on today's packet-based and best-effort Internet. In this paper, we propose an efficient multiple-description coding (MDC) technique based on video frame sub-sampling and cubic-spline interpolation to provide spatial diversity, such that no additional buffering delay or storage is required. The frame dropping rate due to packet loss and drifting error under the multi-path streaming environment is analyzed in this paper.
视频流要求高数据速率和硬延迟限制,它对当今基于分组和尽力而为的互联网提出了几个挑战。在本文中,我们提出了一种基于视频帧子采样和三次样条插值的高效多描述编码(MDC)技术,以提供空间多样性,从而不需要额外的缓冲延迟或存储。分析了多路径流环境下由于丢包和漂移误差导致的丢帧率。
{"title":"Reliable video communication with multi-path streaming using MDC","authors":"I. Lee, L. Guan","doi":"10.1109/ICME.2005.1521522","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521522","url":null,"abstract":"Video streaming demands high data rates and hard delay constraints, and it raises several challenges on today's packet-based and best-effort Internet. In this paper, we propose an efficient multiple-description coding (MDC) technique based on video frame sub-sampling and cubic-spline interpolation to provide spatial diversity, such that no additional buffering delay or storage is required. The frame dropping rate due to packet loss and drifting error under the multi-path streaming environment is analyzed in this paper.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132753187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Supporting rights checking in an MPEG-21 Digital Item Processing environment 支持MPEG-21数字项目处理环境中的权限检查
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521608
F. D. Keukelaere, T. DeMartini, Jeroen Bekaert, R. Walle
Within the world of multimedia, the new MPEG-21 standard is currently under development. The purpose of this new standard is to create an open framework for multimedia delivery and consumption. MPEG-21 mastered the multitude of types of content and metadata by standardizing the declaration of digital items in an XML based format. In addition to standardizing, the declaration of digital items MPEG-21 also standardizes digital item processing, which enables the declaration of suggested uses of digital items. The rights expression language and the rights data dictionary parts of MPEG-21 enable the declaration of what rights (permitted interactions) Users are given to digital items. In this paper, we describe how rights checking can be realized in an environment in which interactions with digital items are declared through digital item processing. We demonstrate how rights checking can be done when "critical" digital item base operations are called and how rights context information can be gathered by tracking during the execution of digital item methods
在多媒体领域,新的MPEG-21标准目前正在开发中。这个新标准的目的是为多媒体的传递和使用创建一个开放的框架。MPEG-21通过以基于XML的格式对数字项目的声明进行标准化,从而掌握了多种类型的内容和元数据。除了标准化之外,数字项目的声明MPEG-21还标准化了数字项目的处理,从而可以声明数字项目的建议用途。MPEG-21的权限表达语言和权限数据字典部分可以声明用户对数字项目的权限(允许的交互)。在本文中,我们描述了如何在通过数字项目处理声明与数字项目的交互的环境中实现权利检查。我们将演示如何在调用“关键”数字项目库操作时进行权利检查,以及如何在执行数字项目方法期间通过跟踪收集权利上下文信息
{"title":"Supporting rights checking in an MPEG-21 Digital Item Processing environment","authors":"F. D. Keukelaere, T. DeMartini, Jeroen Bekaert, R. Walle","doi":"10.1109/ICME.2005.1521608","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521608","url":null,"abstract":"Within the world of multimedia, the new MPEG-21 standard is currently under development. The purpose of this new standard is to create an open framework for multimedia delivery and consumption. MPEG-21 mastered the multitude of types of content and metadata by standardizing the declaration of digital items in an XML based format. In addition to standardizing, the declaration of digital items MPEG-21 also standardizes digital item processing, which enables the declaration of suggested uses of digital items. The rights expression language and the rights data dictionary parts of MPEG-21 enable the declaration of what rights (permitted interactions) Users are given to digital items. In this paper, we describe how rights checking can be realized in an environment in which interactions with digital items are declared through digital item processing. We demonstrate how rights checking can be done when \"critical\" digital item base operations are called and how rights context information can be gathered by tracking during the execution of digital item methods","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132781702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Evaluating keypoint methods for content-based copyright protection of digital images 基于内容的数字图像版权保护关键点方法评价
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521614
Larry Huston, R. Sukthankar, Yan Ke
This paper evaluates the effectiveness of keypoint methods for content-based protection of digital images. These methods identify a set of "distinctive" regions (termed keypoints) in an image and encode them using descriptors that are robust to expected image transformations. To determine whether particular images were derived from a protected image, the keypoints for both images are generated and their descriptors matched. We describe a comprehensive set of experiments to examine how keypoint methods cope with three real-world challenges: (1) loss of keypoints due to cropping; (2) matching failures caused by approximate nearest-neighbor indexing schemes; (3) degraded descriptors due to significant image distortions. While keypoint methods perform very well in general, this paper identifies cases where the accuracy of such methods degrades.
本文评价了基于内容的数字图像保护关键点方法的有效性。这些方法识别图像中的一组“独特”区域(称为关键点),并使用对预期图像转换具有鲁棒性的描述符对其进行编码。为了确定特定图像是否来自受保护的图像,生成两个图像的关键点并匹配它们的描述符。我们描述了一套全面的实验来研究关键点方法如何应对三个现实世界的挑战:(1)由于裁剪导致关键点丢失;(2)近似最近邻索引方案导致的匹配失败;(3)显著的图像失真导致描述符退化。虽然关键点方法通常执行得很好,但本文确定了这种方法的准确性会降低的情况。
{"title":"Evaluating keypoint methods for content-based copyright protection of digital images","authors":"Larry Huston, R. Sukthankar, Yan Ke","doi":"10.1109/ICME.2005.1521614","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521614","url":null,"abstract":"This paper evaluates the effectiveness of keypoint methods for content-based protection of digital images. These methods identify a set of \"distinctive\" regions (termed keypoints) in an image and encode them using descriptors that are robust to expected image transformations. To determine whether particular images were derived from a protected image, the keypoints for both images are generated and their descriptors matched. We describe a comprehensive set of experiments to examine how keypoint methods cope with three real-world challenges: (1) loss of keypoints due to cropping; (2) matching failures caused by approximate nearest-neighbor indexing schemes; (3) degraded descriptors due to significant image distortions. While keypoint methods perform very well in general, this paper identifies cases where the accuracy of such methods degrades.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132985470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hybrid speaker tracking in an automated lecture room 自动演讲室的混合演讲者跟踪
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521365
Cha Zhang, Y. Rui, Li-wei He, M. Wallick
We present a hybrid speaker tracking scheme based on a single pan/tilt/zoom (PTZ) camera in an automated lecture capturing system. Given that the camera's video resolution is higher than the required output resolution, we frame the output video as a sub-region of the camera's input video. This allows us to track the speaker both digitally and mechanically. Digital tracking has the advantage of being smooth, and mechanical tracking can cover a wide area. The hybrid tracking achieves the benefits of both worlds. In addition to hybrid tracking, we present an intelligent pan/zoom selection scheme to improve the aestheticity of the lecture scene.
提出了一种基于单平移/倾斜/变焦(PTZ)相机的混合演讲者跟踪方案。考虑到摄像机的视频分辨率高于所需的输出分辨率,我们将输出视频帧为摄像机输入视频的子区域。这使我们能够以数字和机械方式跟踪扬声器。数字跟踪具有平滑的优点,而机械跟踪可以覆盖广泛的区域。混合跟踪实现了这两个世界的好处。除了混合跟踪外,我们还提出了一种智能平移/缩放选择方案,以提高演讲场景的美观性。
{"title":"Hybrid speaker tracking in an automated lecture room","authors":"Cha Zhang, Y. Rui, Li-wei He, M. Wallick","doi":"10.1109/ICME.2005.1521365","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521365","url":null,"abstract":"We present a hybrid speaker tracking scheme based on a single pan/tilt/zoom (PTZ) camera in an automated lecture capturing system. Given that the camera's video resolution is higher than the required output resolution, we frame the output video as a sub-region of the camera's input video. This allows us to track the speaker both digitally and mechanically. Digital tracking has the advantage of being smooth, and mechanical tracking can cover a wide area. The hybrid tracking achieves the benefits of both worlds. In addition to hybrid tracking, we present an intelligent pan/zoom selection scheme to improve the aestheticity of the lecture scene.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133259502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Proactive Energy Optimization Algorithms for Wavelet-Based Video Codecs on Power-Aware Processors 功率感知处理器上基于小波的视频编解码器的主动能量优化算法
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521486
V. Akella, M. Schaar, W. Kao
We propose a systematic technique for characterizing the workload of a video decoder at a given time and transforming the shape of the workload to optimize the utilization of a critical resource without compromising the distortion incurred in the process. We call our approach proactive resource management. We will illustrate our techniques by addressing the problem of minimizing the energy consumption during decoding a video sequence on a programmable processor that supports multiple voltages and frequencies. We evaluate two different heuristics for the underlying optimization problem that result in 50% to 92% improvements in energy savings compared to techniques that do not use dynamic adaptation
我们提出了一种系统的技术,用于在给定时间表征视频解码器的工作负载,并转换工作负载的形状,以优化关键资源的利用,而不会损害过程中产生的失真。我们称我们的方法为主动资源管理。我们将通过解决在支持多个电压和频率的可编程处理器上解码视频序列时最小化能耗的问题来说明我们的技术。我们评估了针对潜在优化问题的两种不同的启发式方法,与不使用动态自适应的技术相比,这两种方法的节能效果提高了50%至92%
{"title":"Proactive Energy Optimization Algorithms for Wavelet-Based Video Codecs on Power-Aware Processors","authors":"V. Akella, M. Schaar, W. Kao","doi":"10.1109/ICME.2005.1521486","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521486","url":null,"abstract":"We propose a systematic technique for characterizing the workload of a video decoder at a given time and transforming the shape of the workload to optimize the utilization of a critical resource without compromising the distortion incurred in the process. We call our approach proactive resource management. We will illustrate our techniques by addressing the problem of minimizing the energy consumption during decoding a video sequence on a programmable processor that supports multiple voltages and frequencies. We evaluate two different heuristics for the underlying optimization problem that result in 50% to 92% improvements in energy savings compared to techniques that do not use dynamic adaptation","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131882778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
WA-TV: Webifying and Augmenting Broadcast Content for Next-Generation Storage TV WA-TV:下一代存储电视的网络和增强广播内容
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521716
H. Miyamori, Qiang Ma, Katsumi Tanaka
A method is proposed for viewing broadcast content that converts TV programs into Web content and integrates the results with complementary information retrieved using the Internet. Converting the programs into Web pages enables the programs to be skimmed over to get an overview and for particular scenes to be easily explored. Integrating complementary information enables the programs to be viewed efficiently with value-added content. An intuitive, user-friendly browsing interface enables the user to easily changing the level of detail displayed for the integrated information by zooming. Preliminary testing of a prototype system for next-generation storage TV, "WA-TV", validated the approach taken by the proposed method
提出了一种观看广播内容的方法,该方法将电视节目转换为Web内容,并将结果与使用Internet检索的补充信息集成在一起。将程序转换为Web页面可以使程序略读以获得概述,并且可以轻松地探索特定场景。整合互补信息使节目能够与增值内容一起有效地观看。直观、用户友好的浏览界面使用户可以通过缩放轻松更改显示的集成信息的详细程度。对下一代存储电视“WA-TV”原型系统的初步测试验证了所提出方法所采用的方法
{"title":"WA-TV: Webifying and Augmenting Broadcast Content for Next-Generation Storage TV","authors":"H. Miyamori, Qiang Ma, Katsumi Tanaka","doi":"10.1109/ICME.2005.1521716","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521716","url":null,"abstract":"A method is proposed for viewing broadcast content that converts TV programs into Web content and integrates the results with complementary information retrieved using the Internet. Converting the programs into Web pages enables the programs to be skimmed over to get an overview and for particular scenes to be easily explored. Integrating complementary information enables the programs to be viewed efficiently with value-added content. An intuitive, user-friendly browsing interface enables the user to easily changing the level of detail displayed for the integrated information by zooming. Preliminary testing of a prototype system for next-generation storage TV, \"WA-TV\", validated the approach taken by the proposed method","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133093034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An audio spread-spectrum data hiding system with an informed embedding strategy adapted to a Wiener filtering based receiver 一种适用于基于维纳滤波接收机的信息嵌入策略的音频扩频数据隐藏系统
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521598
C. Baras, N. Moreau
A particular application of audio data hiding systems and watermarking systems consists of using the audio signal as a transmission channel for binary information. The system should ensure a reliable and robust transmission for various channel perturbations but also propose a low computational cost for real-time applications. In this paper, we present a hybrid spread-spectrum data hiding system, which combines two reference systems taken from the State-Of-The-Art: the one based on a real-time receiver and the other one based on an informed embedding strategy with maximized robustness to additive perturbations. Experimental results permit to assess the efficiency of the system in terms of: (1) transmission reliability, which is significantly improved compared to reference systems, and (2) computational costs, which allows for the feasible real-time reception process of broadcast applications with off-line embedding.
音频数据隐藏系统和水印系统的一个特殊应用是利用音频信号作为二进制信息的传输通道。该系统既要保证在各种信道扰动下的可靠鲁棒传输,又要为实时应用提供较低的计算成本。本文提出了一种混合扩频数据隐藏系统,该系统结合了两种最先进的参考系统:一种基于实时接收器,另一种基于对加性扰动具有最大鲁棒性的知情嵌入策略。实验结果可以从以下方面评估系统的效率:(1)传输可靠性,与参考系统相比有显著提高;(2)计算成本,允许离线嵌入广播应用的可行实时接收过程。
{"title":"An audio spread-spectrum data hiding system with an informed embedding strategy adapted to a Wiener filtering based receiver","authors":"C. Baras, N. Moreau","doi":"10.1109/ICME.2005.1521598","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521598","url":null,"abstract":"A particular application of audio data hiding systems and watermarking systems consists of using the audio signal as a transmission channel for binary information. The system should ensure a reliable and robust transmission for various channel perturbations but also propose a low computational cost for real-time applications. In this paper, we present a hybrid spread-spectrum data hiding system, which combines two reference systems taken from the State-Of-The-Art: the one based on a real-time receiver and the other one based on an informed embedding strategy with maximized robustness to additive perturbations. Experimental results permit to assess the efficiency of the system in terms of: (1) transmission reliability, which is significantly improved compared to reference systems, and (2) computational costs, which allows for the feasible real-time reception process of broadcast applications with off-line embedding.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134466056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A HMM-Embedded Unsupervised Learning to Musical Event Detection 基于hmm嵌入的无监督学习音乐事件检测
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521428
Sheng Gao, Yongwei Zhu
In this paper, an HMM-embedded unsupervised learning approach is proposed to detect the music events by grouping the similar segments of the music signal. This approach can cluster the segments based on their similarity of the spectral as well as the temporal structures. This is not easily done for clustering with the traditional similarity measures. Together with a Bayesian information criterion, the proposed approach can obtain a suitable event set to regularize the complexity of the model structure. The natural product of the approach is a set of music events modeled by the HMMs. Our experimental analyses show that the detected musical events have more perceptual meaning and are more consistent than the KL-distance based clustering. The learned events match better with our experience in spectrogram reading. Its capacity is further evaluated on a task of music identification. The identification error rate is reduced to 1.57%, and 56.3% relative error rate reduction is observed comparing with the system trained using the KL-distance clustering method
本文提出了一种嵌入hmm的无监督学习方法,通过对音乐信号的相似片段进行分组来检测音乐事件。该方法可以根据光谱的相似性和时间结构对图像片段进行聚类。这对于使用传统相似性度量的聚类来说是不容易做到的。该方法结合贝叶斯信息准则,可以得到一个合适的事件集来正则化模型结构的复杂性。这种方法的自然产物是一组由hmm建模的音乐事件。我们的实验分析表明,与基于kl距离的聚类相比,检测到的音乐事件具有更多的感知意义,并且更一致。学习到的事件与我们读谱图的经验更吻合。它的能力在音乐识别任务中得到进一步评估。识别错误率降至1.57%,与使用KL-distance聚类方法训练的系统相比,相对错误率降低56.3%
{"title":"A HMM-Embedded Unsupervised Learning to Musical Event Detection","authors":"Sheng Gao, Yongwei Zhu","doi":"10.1109/ICME.2005.1521428","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521428","url":null,"abstract":"In this paper, an HMM-embedded unsupervised learning approach is proposed to detect the music events by grouping the similar segments of the music signal. This approach can cluster the segments based on their similarity of the spectral as well as the temporal structures. This is not easily done for clustering with the traditional similarity measures. Together with a Bayesian information criterion, the proposed approach can obtain a suitable event set to regularize the complexity of the model structure. The natural product of the approach is a set of music events modeled by the HMMs. Our experimental analyses show that the detected musical events have more perceptual meaning and are more consistent than the KL-distance based clustering. The learned events match better with our experience in spectrogram reading. Its capacity is further evaluated on a task of music identification. The identification error rate is reduced to 1.57%, and 56.3% relative error rate reduction is observed comparing with the system trained using the KL-distance clustering method","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133257324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Speech-adaptive layered G.729 coder for loss concealments of real-time voice over IP 用于IP实时语音丢失隐藏的语音自适应分层G.729编码器
Pub Date : 2005-07-06 DOI: 10.1109/ICME.2005.1521637
B. Sat, B. Wah
In this paper, we propose a speech-adaptive layered-coding (LC) scheme for the loss concealments of real-time CELP-coded speech transmitted over IP networks. Based on the ITU G.729 CS-ACELP codec operating at 8 Kbps, we design a loss-robust speech-adaptive codec at the same bit rate. Our scheme employs LC with redundant packetization in order to conceal losses and adapt to dynamic loss conditions characterized by the loss rate and the degree of burst, while maintaining an acceptable end-to-end delay. By protecting only the most important excitation parameters of each frame according to its speech type, our approach enables more efficient use of the bit budget. Our scheme delivers good-quality speech with a level of protection similar to full replication under medium loss rates, provides speech quality similar to the standard G.729 under very low loss rates, and outperforms both for low-to-medium loss rates.
在本文中,我们提出了一种语音自适应分层编码(LC)方案,用于在IP网络上传输的实时celp编码语音的丢失隐藏。基于8kbps速率的ITU G.729 CS-ACELP编解码器,设计了一种具有相同码率的无损鲁棒语音自适应编解码器。我们的方案采用具有冗余分组的LC来隐藏损耗,并适应以损失率和突发程度为特征的动态损耗条件,同时保持可接受的端到端延迟。通过根据每帧的语音类型只保护最重要的激励参数,我们的方法可以更有效地利用比特预算。我们的方案提供高质量的语音,在中等损失率下提供类似于完全复制的保护水平,在非常低的损失率下提供类似于标准G.729的语音质量,并且在中低损失率下优于两者。
{"title":"Speech-adaptive layered G.729 coder for loss concealments of real-time voice over IP","authors":"B. Sat, B. Wah","doi":"10.1109/ICME.2005.1521637","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521637","url":null,"abstract":"In this paper, we propose a speech-adaptive layered-coding (LC) scheme for the loss concealments of real-time CELP-coded speech transmitted over IP networks. Based on the ITU G.729 CS-ACELP codec operating at 8 Kbps, we design a loss-robust speech-adaptive codec at the same bit rate. Our scheme employs LC with redundant packetization in order to conceal losses and adapt to dynamic loss conditions characterized by the loss rate and the degree of burst, while maintaining an acceptable end-to-end delay. By protecting only the most important excitation parameters of each frame according to its speech type, our approach enables more efficient use of the bit budget. Our scheme delivers good-quality speech with a level of protection similar to full replication under medium loss rates, provides speech quality similar to the standard G.729 under very low loss rates, and outperforms both for low-to-medium loss rates.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133004046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2005 IEEE International Conference on Multimedia and Expo
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1