首页 > 最新文献

2006 IEEE International Conference on Multimedia and Expo最新文献

英文 中文
Emotional Speech Synthesis using Subspace Constraints in Prosody 基于韵律子空间约束的情感语音合成
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262725
Shinya Mori, T. Moriyama, S. Ozawa
An efficient speech synthesis method that uses subspace constraint in prosody is proposed. Conventional unit selection methods concatenate speech segments stored in database, that require enormous number of waveforms in synthesizing various emotional expressions with arbitrary texts. The proposed method employs principal component analysis to reduce the dimensionality of prosodic components, that also allows us to generate new speech that are similar to training samples. The subspace constraint assures that the prosody of the synthesized speech including F0, power, and speech length hold their correlative relation that training samples of emotional speech have. We assume that the combination of the number of syllables and the accent type determines the correlative dynamics of prosody, for each of which we individually construct the subspace. The subspace is then linearly related to emotions by multiple regression analysis that are obtained by subjective evaluation for the training samples. Experimental results demonstrated that only 4 dimensions were sufficient for representing the prosodic changes due to emotion at over 90% of the total variance. Synthesized emotion were successfully recognized by the listeners of the synthesized speech, especially for "anger", "surprise", "disgust", 'sorrow", "boredom", "depression", and "joy"
提出了一种利用韵律子空间约束的高效语音合成方法。传统的单元选择方法是将数据库中存储的语音片段拼接在一起,这需要大量的波形来合成任意文本的各种情感表达。该方法采用主成分分析来降低韵律成分的维数,从而生成与训练样本相似的新语音。子空间约束保证了合成语音的韵律(包括F0、功率和语音长度)保持情感语音训练样本所具有的相关关系。我们假设音节数和重音类型的组合决定了韵律的相关动态,我们分别为每一个音节和重音类型构建子空间。然后,通过对训练样本的主观评价获得的多元回归分析,子空间与情绪线性相关。实验结果表明,在总方差的90%以上,仅4个维度就足以表征情绪引起的韵律变化。合成的情绪被听众成功地识别出来,特别是“愤怒”、“惊讶”、“厌恶”、“悲伤”、“无聊”、“沮丧”和“喜悦”。
{"title":"Emotional Speech Synthesis using Subspace Constraints in Prosody","authors":"Shinya Mori, T. Moriyama, S. Ozawa","doi":"10.1109/ICME.2006.262725","DOIUrl":"https://doi.org/10.1109/ICME.2006.262725","url":null,"abstract":"An efficient speech synthesis method that uses subspace constraint in prosody is proposed. Conventional unit selection methods concatenate speech segments stored in database, that require enormous number of waveforms in synthesizing various emotional expressions with arbitrary texts. The proposed method employs principal component analysis to reduce the dimensionality of prosodic components, that also allows us to generate new speech that are similar to training samples. The subspace constraint assures that the prosody of the synthesized speech including F0, power, and speech length hold their correlative relation that training samples of emotional speech have. We assume that the combination of the number of syllables and the accent type determines the correlative dynamics of prosody, for each of which we individually construct the subspace. The subspace is then linearly related to emotions by multiple regression analysis that are obtained by subjective evaluation for the training samples. Experimental results demonstrated that only 4 dimensions were sufficient for representing the prosodic changes due to emotion at over 90% of the total variance. Synthesized emotion were successfully recognized by the listeners of the synthesized speech, especially for \"anger\", \"surprise\", \"disgust\", 'sorrow\", \"boredom\", \"depression\", and \"joy\"","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114564792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Implementation and Evolution of Packet Striping for Media Streaming Over Multiple Burst-Loss Channels 多突发丢失信道媒体流的分组分条实现与发展
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262734
Gene Cheung, P. Sharma, Sung-Ju Lee
Modern mobile devices are multi-homed with WLAN and WWAN communication interfaces. In a community of nodes with such multi-homed devices-locally inter-connected via high-speed WLAN but each globally connected to larger networks via low-speed WWAN, striping high-volume traffic from remote large networks over a bundle of low speed WWAN links can overcome the bandwidth mismatch problem between WLAN and WWAN. In our previous work, we showed that a packet striping system for such multi-homed devices-a mapping of delay-sensitive packets by an intermediate gateway to multiple channels using combination of retransmissions (ARQ) and forward error corrections (FEC)-can dramatically enhance the overall performance. In this paper, we improve upon a previous algorithm in two respects. First, by introducing two-tier dynamic programming tables to memoize computed solutions, packet striping decisions translate to simple table lookup operations given stationary network statistics. Doing so drastically reduces striping operation complexity. Second, new weighting functions are introduced into the hybrid ARQ/FEC algorithm to drive the long-term striping system evolution away from pathological local minima that are far from the global optimum. Results show the new algorithm performs efficiently and gives improved performance by avoiding local minima compared to the previous algorithm
现代移动设备是多主的,具有WLAN和WWAN通信接口。在具有这种多主设备的节点社区中——本地通过高速WLAN互连,但每个设备通过低速WWAN全局连接到更大的网络——通过一束低速WWAN链路从远程大型网络中分离出大量流量可以克服WLAN和WWAN之间的带宽不匹配问题。在我们之前的工作中,我们展示了用于此类多主设备的数据包条带化系统(通过中间网关使用重传(ARQ)和前向纠错(FEC)的组合将延迟敏感数据包映射到多个通道)可以显着提高整体性能。在本文中,我们从两个方面改进了先前的算法。首先,通过引入两层动态规划表来记忆计算的解决方案,在给定固定网络统计数据的情况下,数据包剥离决策转化为简单的表查找操作。这样做大大降低了条带化操作的复杂性。其次,在混合ARQ/FEC算法中引入新的加权函数,使条带化系统的长期演化远离远离全局最优的病态局部最小值;实验结果表明,新算法具有较好的性能,避免了局部极小值的出现,提高了算法的性能
{"title":"Implementation and Evolution of Packet Striping for Media Streaming Over Multiple Burst-Loss Channels","authors":"Gene Cheung, P. Sharma, Sung-Ju Lee","doi":"10.1109/ICME.2006.262734","DOIUrl":"https://doi.org/10.1109/ICME.2006.262734","url":null,"abstract":"Modern mobile devices are multi-homed with WLAN and WWAN communication interfaces. In a community of nodes with such multi-homed devices-locally inter-connected via high-speed WLAN but each globally connected to larger networks via low-speed WWAN, striping high-volume traffic from remote large networks over a bundle of low speed WWAN links can overcome the bandwidth mismatch problem between WLAN and WWAN. In our previous work, we showed that a packet striping system for such multi-homed devices-a mapping of delay-sensitive packets by an intermediate gateway to multiple channels using combination of retransmissions (ARQ) and forward error corrections (FEC)-can dramatically enhance the overall performance. In this paper, we improve upon a previous algorithm in two respects. First, by introducing two-tier dynamic programming tables to memoize computed solutions, packet striping decisions translate to simple table lookup operations given stationary network statistics. Doing so drastically reduces striping operation complexity. Second, new weighting functions are introduced into the hybrid ARQ/FEC algorithm to drive the long-term striping system evolution away from pathological local minima that are far from the global optimum. Results show the new algorithm performs efficiently and gives improved performance by avoiding local minima compared to the previous algorithm","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121866665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Online Training-Oriented Video Shooting Navigation System Based on Real-Time Camerawork Evaluation 基于实时摄像评价的在线培训视频拍摄导航系统
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262772
Masahito Kumano, K. Uehara, Y. Ariki
In this paper, we propose an online training-oriented video shooting navigation system focused on camerawork based on video grammar by real-time camerawork evaluation to train users shooting nice shots for the later editing work. In this system, the processing speed must be very high so that we use a luminance projection correlation and a structure tensor method to extract the camerawork parameters in real-time. From the results of camerawork analysis, the results of each frame are classified into 7 camerawork types and the system issues 6 types of alarms and navigates users along the specified shot depending on camerawork based on video grammar in real-time while shooting the shot. Thereby, users can naturally acquire shooting style by trying to decrease alarms of improper camerawork without a consideration of the video grammar
本文提出了一种基于视频语法的以视频为核心的在线培训视频拍摄导航系统,通过对视频的实时评价,训练用户拍摄出好看的视频,为后期的编辑工作做准备。在该系统中,处理速度要求很高,因此我们采用亮度投影相关法和结构张量法实时提取相机参数。从镜头分析的结果来看,将每一帧的结果划分为7种镜头类型,在拍摄镜头的同时,系统根据视频语法实时发出6种类型的报警,并根据指定的镜头为用户导航。因此,用户可以在不考虑视频语法的情况下,通过尽量减少拍摄不当的报警,自然地获得拍摄风格
{"title":"Online Training-Oriented Video Shooting Navigation System Based on Real-Time Camerawork Evaluation","authors":"Masahito Kumano, K. Uehara, Y. Ariki","doi":"10.1109/ICME.2006.262772","DOIUrl":"https://doi.org/10.1109/ICME.2006.262772","url":null,"abstract":"In this paper, we propose an online training-oriented video shooting navigation system focused on camerawork based on video grammar by real-time camerawork evaluation to train users shooting nice shots for the later editing work. In this system, the processing speed must be very high so that we use a luminance projection correlation and a structure tensor method to extract the camerawork parameters in real-time. From the results of camerawork analysis, the results of each frame are classified into 7 camerawork types and the system issues 6 types of alarms and navigates users along the specified shot depending on camerawork based on video grammar in real-time while shooting the shot. Thereby, users can naturally acquire shooting style by trying to decrease alarms of improper camerawork without a consideration of the video grammar","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122009310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A High Throughput VLSI Architecture Design for H.264 Context-Based Adaptive Binary Arithmetic Decoding with Look Ahead Parsing H.264上下文自适应二进制算术解码的高吞吐量VLSI架构设计
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262510
Yao-Chang Yang, Chien-Chang Lin, Hsui-Cheng Chang, Ching-Lung Su, Jiun-In Guo
In this paper we present a high throughput VLSI architecture design for context-based adaptive binary arithmetic decoding (CABAD) in MPEG-4 AVC/H.264. To speed-up the inherent sequential operations in CABAD, we break down the processing bottleneck by proposing a look-ahead codeword parsing technique on the segmenting context tables with cache registers, which averagely reduces up to 53% of cycle count. Based on a 0.18 mum CMOS technology, the proposed design outperforms the existing design by both reducing 40% of hardware cost and achieving about 1.6 times data throughput at the same time
本文提出了MPEG-4 AVC/H.264中基于上下文的自适应二进制算术解码(CABAD)的高吞吐量VLSI架构设计。为了加速CABAD中固有的顺序操作,我们提出了一种基于缓存寄存器的分段上下文表的前瞻性码字解析技术,从而打破了处理瓶颈,平均减少了53%的周期计数。该设计基于0.18 μ m CMOS技术,硬件成本降低40%,同时数据吞吐量达到现有设计的1.6倍,优于现有设计
{"title":"A High Throughput VLSI Architecture Design for H.264 Context-Based Adaptive Binary Arithmetic Decoding with Look Ahead Parsing","authors":"Yao-Chang Yang, Chien-Chang Lin, Hsui-Cheng Chang, Ching-Lung Su, Jiun-In Guo","doi":"10.1109/ICME.2006.262510","DOIUrl":"https://doi.org/10.1109/ICME.2006.262510","url":null,"abstract":"In this paper we present a high throughput VLSI architecture design for context-based adaptive binary arithmetic decoding (CABAD) in MPEG-4 AVC/H.264. To speed-up the inherent sequential operations in CABAD, we break down the processing bottleneck by proposing a look-ahead codeword parsing technique on the segmenting context tables with cache registers, which averagely reduces up to 53% of cycle count. Based on a 0.18 mum CMOS technology, the proposed design outperforms the existing design by both reducing 40% of hardware cost and achieving about 1.6 times data throughput at the same time","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116835148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Matching Faces with Textual Cues in Soccer Videos 足球视频中人脸与文本线索的匹配
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262444
M. Bertini, A. Bimbo, W. Nunziati
In soccer videos, most significant actions are usually followed by close-up shots of players that take part in the action itself. Automatically annotating the identity of the players present in these shots would be considerably valuable for indexing and retrieval applications. Due to high variations in pose and illumination across shots however, current face recognition methods are not suitable for this task. We show how the inherent multiple media structure of soccer videos can be exploited to understand the players' identity without relying on direct face recognition. The proposed method is based on a combination of interest point detector to "read" textual cues that allow to label a player with its name, such as the number depicted on its jersey, or the superimposed text caption showing its name. Players not identified by this process are then assigned to one of the labeled faces by means of a face similarity measure, again based on the appearance of local salient patches. We present results obtained from soccer videos taken from various recent games between national teams
在足球视频中,最重要的动作之后通常是参与动作的球员的特写镜头。自动标注这些镜头中球员的身份对于索引和检索应用程序非常有价值。然而,由于姿态和光照的高度变化,当前的人脸识别方法不适合这项任务。我们展示了如何利用足球视频固有的多媒体结构来了解球员的身份,而不依赖于直接的面部识别。所提出的方法是基于兴趣点检测器的组合来“读取”文本线索,这些线索允许用球员的名字标记球员,例如球衣上描绘的号码,或者显示其名字的叠加文本标题。未被此过程识别的玩家,然后通过面部相似性测量,再次基于局部显著补丁的外观,分配到一个标记的面孔。我们展示了从国家队之间最近的各种比赛的足球视频中获得的结果
{"title":"Matching Faces with Textual Cues in Soccer Videos","authors":"M. Bertini, A. Bimbo, W. Nunziati","doi":"10.1109/ICME.2006.262444","DOIUrl":"https://doi.org/10.1109/ICME.2006.262444","url":null,"abstract":"In soccer videos, most significant actions are usually followed by close-up shots of players that take part in the action itself. Automatically annotating the identity of the players present in these shots would be considerably valuable for indexing and retrieval applications. Due to high variations in pose and illumination across shots however, current face recognition methods are not suitable for this task. We show how the inherent multiple media structure of soccer videos can be exploited to understand the players' identity without relying on direct face recognition. The proposed method is based on a combination of interest point detector to \"read\" textual cues that allow to label a player with its name, such as the number depicted on its jersey, or the superimposed text caption showing its name. Players not identified by this process are then assigned to one of the labeled faces by means of a face similarity measure, again based on the appearance of local salient patches. We present results obtained from soccer videos taken from various recent games between national teams","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128534441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Scalability in Human Shape Analysis 人体形状分析中的可扩展性
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262651
Thomas Fourès, P. Joly
This paper proposes a new approach for the human motion analysis. The main contribution comes from the proposed representation of the human body. Most of already existing systems are based on a model. When this one is a priori known, it may not evolve automatically according to user needs, or to the detail level that is actually possible to extract, or to restrictions due to the processing time. In order to propose a more flexible system, a hierarchical representation of the human body is implemented. It aims at providing a multi-resolution description and results at different levels of accuracy. An explanation about the model construction and the method used to map it onto features extracted from an image sequence are presented. Relations between the different body limbs and some physical constraints are then integrated. The transition from a model level to the next one is also explained and results on frames coming from a video sequence give an illustration of the proposed strategy
本文提出了一种新的人体运动分析方法。主要贡献来自拟制的人体表现。大多数已经存在的系统都是基于模型的。当这是先验已知的时候,它可能不会根据用户需求,或者实际可能提取的细节级别,或者由于处理时间的限制而自动发展。为了提出一个更灵活的系统,实现了人体的分层表示。它旨在提供不同精度水平的多分辨率描述和结果。介绍了模型的构造和将其映射到从图像序列中提取的特征上的方法。然后将不同肢体之间的关系和一些物理约束进行整合。还解释了从一个模型级别到下一个模型级别的转换,并对来自视频序列的帧进行了结果说明了所提出的策略
{"title":"Scalability in Human Shape Analysis","authors":"Thomas Fourès, P. Joly","doi":"10.1109/ICME.2006.262651","DOIUrl":"https://doi.org/10.1109/ICME.2006.262651","url":null,"abstract":"This paper proposes a new approach for the human motion analysis. The main contribution comes from the proposed representation of the human body. Most of already existing systems are based on a model. When this one is a priori known, it may not evolve automatically according to user needs, or to the detail level that is actually possible to extract, or to restrictions due to the processing time. In order to propose a more flexible system, a hierarchical representation of the human body is implemented. It aims at providing a multi-resolution description and results at different levels of accuracy. An explanation about the model construction and the method used to map it onto features extracted from an image sequence are presented. Relations between the different body limbs and some physical constraints are then integrated. The transition from a model level to the next one is also explained and results on frames coming from a video sequence give an illustration of the proposed strategy","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"1992 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Cross-Layered Peer-to-Peer Architecture for Wireless Mobile Networks 无线移动网络的跨层点对点体系结构
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262625
Mohammad Mursalin Akon, S. Naik, Ajit Singh, Xuemin Shen
In this paper, we propose a novel peer-to-peer architecture for wireless mobile networks where a cross-layered gossip-like protocol is the heart of the architecture. The goal of this architecture is to reduce the bandwidth consumption and at the same time, to provide more user participation flexibility. Simulation results are given to demonstrate the performance of the proposed peer-to-peer architecture
在本文中,我们提出了一种新的无线移动网络点对点架构,其中跨层的八卦协议是该架构的核心。这种架构的目标是减少带宽消耗,同时提供更多的用户参与灵活性。仿真结果验证了所提出的点对点架构的性能
{"title":"A Cross-Layered Peer-to-Peer Architecture for Wireless Mobile Networks","authors":"Mohammad Mursalin Akon, S. Naik, Ajit Singh, Xuemin Shen","doi":"10.1109/ICME.2006.262625","DOIUrl":"https://doi.org/10.1109/ICME.2006.262625","url":null,"abstract":"In this paper, we propose a novel peer-to-peer architecture for wireless mobile networks where a cross-layered gossip-like protocol is the heart of the architecture. The goal of this architecture is to reduce the bandwidth consumption and at the same time, to provide more user participation flexibility. Simulation results are given to demonstrate the performance of the proposed peer-to-peer architecture","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129277390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
GPU Accelerated Inverse Photon Mapping for Real-Time Surface Reflectance Modeling 用于实时表面反射建模的GPU加速逆光子映射
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262528
Takashi Machida, N. Yokoya, H. Takemura
This paper investigates the problem of object surface reflectance modeling, which is sometimes referred to as inverse reflectometry, for photorealistic rendering and effective multimedia applications. A number of methods have been developed for estimating object surface reflectance properties in order to render real objects under arbitrary illumination conditions. However, it is still difficult to densely estimate surface reflectance properties in real-time. This paper describes a new method for real-time estimation of the non-uniform surface reflectance properties in the inverse rendering framework. Experiments are conducted in order to demonstrate the usefulness and the advantage of the proposed methods through comparative study
本文研究了物体表面反射率建模问题,有时也被称为反反射率,用于逼真的渲染和有效的多媒体应用。为了在任意光照条件下渲染真实物体,已经开发了许多估算物体表面反射率特性的方法。然而,实时密集估计表面反射率仍然是一个难题。本文提出了一种在反演框架下实时估计非均匀表面反射率的新方法。通过实验对比研究,验证了所提方法的有效性和优越性
{"title":"GPU Accelerated Inverse Photon Mapping for Real-Time Surface Reflectance Modeling","authors":"Takashi Machida, N. Yokoya, H. Takemura","doi":"10.1109/ICME.2006.262528","DOIUrl":"https://doi.org/10.1109/ICME.2006.262528","url":null,"abstract":"This paper investigates the problem of object surface reflectance modeling, which is sometimes referred to as inverse reflectometry, for photorealistic rendering and effective multimedia applications. A number of methods have been developed for estimating object surface reflectance properties in order to render real objects under arbitrary illumination conditions. However, it is still difficult to densely estimate surface reflectance properties in real-time. This paper describes a new method for real-time estimation of the non-uniform surface reflectance properties in the inverse rendering framework. Experiments are conducted in order to demonstrate the usefulness and the advantage of the proposed methods through comparative study","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video News Shot Labeling Refinement via Shot Rhythm Models 视频新闻镜头标记细化通过镜头节奏模型
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262544
J. Kender, M. Naphade
We present a three-step post-processing method for increasing the precision of video shot labels in the domain of television news. First, we demonstrate that news shot sequences can be characterized by rhythms of alternation (due to dialogue), repetition (due to persistent background settings), or both. Thus a temporal model is necessarily third-order Markov. Second, we demonstrate that the output of feature detectors derived from machine learning methods (in particular, from SVMs) can be converted into probabilities in a more effective way than two suggested existing methods. This is particularly true when detectors are errorful due to sparse training sets, as is common in this domain. Third, we demonstrate that a straightforward application of the Viterbi algorithm on a third-order FSM, constructed from observed transition probabilities and converted feature detector outputs, can refine feature label precision at little cost. We show that on a test corpus of TRECVID 2005 news videos annotated with 39 LSCOM-lite features, the mean increase in the measure of average precision (AP) was 4%, with some of the rarer and more difficult features having relative increases in AP of as much as 67%
为了提高电视新闻领域视频镜头标签的精度,提出了一种三步后处理方法。首先,我们证明了新闻镜头序列可以由交替的节奏(由于对话),重复(由于持续的背景设置),或两者兼而有之。因此,时间模型必然是三阶马尔可夫模型。其次,我们证明了来自机器学习方法(特别是来自支持向量机)的特征检测器的输出可以以比两种建议的现有方法更有效的方式转换为概率。当检测器由于训练集稀疏而出错时尤其如此,这在该领域很常见。第三,我们证明了Viterbi算法在三阶FSM上的直接应用,该算法由观察到的转移概率和转换后的特征检测器输出组成,可以以很小的代价改进特征标记精度。我们发现,在TRECVID 2005新闻视频的测试语料库上注释了39个LSCOM-lite特征,平均精度(AP)的平均增加为4%,一些更罕见和更困难的特征的AP相对增加高达67%
{"title":"Video News Shot Labeling Refinement via Shot Rhythm Models","authors":"J. Kender, M. Naphade","doi":"10.1109/ICME.2006.262544","DOIUrl":"https://doi.org/10.1109/ICME.2006.262544","url":null,"abstract":"We present a three-step post-processing method for increasing the precision of video shot labels in the domain of television news. First, we demonstrate that news shot sequences can be characterized by rhythms of alternation (due to dialogue), repetition (due to persistent background settings), or both. Thus a temporal model is necessarily third-order Markov. Second, we demonstrate that the output of feature detectors derived from machine learning methods (in particular, from SVMs) can be converted into probabilities in a more effective way than two suggested existing methods. This is particularly true when detectors are errorful due to sparse training sets, as is common in this domain. Third, we demonstrate that a straightforward application of the Viterbi algorithm on a third-order FSM, constructed from observed transition probabilities and converted feature detector outputs, can refine feature label precision at little cost. We show that on a test corpus of TRECVID 2005 news videos annotated with 39 LSCOM-lite features, the mean increase in the measure of average precision (AP) was 4%, with some of the rarer and more difficult features having relative increases in AP of as much as 67%","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114623917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Joint Source-Channel Decoding of Multiple Description Quantized and Variable Length Coded Markov Sequences 多重描述量化变长马尔可夫序列的信源信道联合解码
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262808
X. Wang, Xiaolin Wu
This paper proposes a framework for joint source-channel decoding of Markov sequences that are encoded by an entropy coded multiple description quantizer (MDQ), and transmitted via a lossy network. This framework is particularly suited for lossy networks of inexpensive energy-deprived mobile source encoders. Our approach is one of maximum aposteriori probability (MAP) sequence estimation that exploits both the source memory and the correlation between different MDQ descriptions. The MAP problem is modeled and solved as one of the longest path in a weighted directed acyclic graph. For MDQ-compressed Markov sequences impaired by both bit errors and erasure errors, the proposed joint source-channel MAP decoder can achieve 5 dB higher SNR than the conventional hard-decision decoder. Furthermore, the new MDQ decoding technique unifies the treatments of different subsets of the K descriptions available at the decoder, circumventing the thorny issue of requiring up to 2K-1 MDQ side decoders
提出了一种用熵编码多重描述量化器(MDQ)编码并通过有损网络传输的马尔可夫序列的信源信道联合解码框架。这个框架特别适合于低成本的低能耗移动源编码器的有损网络。我们的方法是利用源内存和不同MDQ描述之间的相关性的最大后验概率(MAP)序列估计之一。将MAP问题建模并求解为加权有向无环图中的最长路径之一。对于同时存在比特错误和擦除错误的mdq压缩马尔可夫序列,本文提出的源信道联合MAP解码器比传统硬判决解码器的信噪比提高了5 dB。此外,新的MDQ解码技术统一了解码器中可用的K描述的不同子集的处理,避免了需要多达2K-1 MDQ侧解码器的棘手问题
{"title":"Joint Source-Channel Decoding of Multiple Description Quantized and Variable Length Coded Markov Sequences","authors":"X. Wang, Xiaolin Wu","doi":"10.1109/ICME.2006.262808","DOIUrl":"https://doi.org/10.1109/ICME.2006.262808","url":null,"abstract":"This paper proposes a framework for joint source-channel decoding of Markov sequences that are encoded by an entropy coded multiple description quantizer (MDQ), and transmitted via a lossy network. This framework is particularly suited for lossy networks of inexpensive energy-deprived mobile source encoders. Our approach is one of maximum aposteriori probability (MAP) sequence estimation that exploits both the source memory and the correlation between different MDQ descriptions. The MAP problem is modeled and solved as one of the longest path in a weighted directed acyclic graph. For MDQ-compressed Markov sequences impaired by both bit errors and erasure errors, the proposed joint source-channel MAP decoder can achieve 5 dB higher SNR than the conventional hard-decision decoder. Furthermore, the new MDQ decoding technique unifies the treatments of different subsets of the K descriptions available at the decoder, circumventing the thorny issue of requiring up to 2K-1 MDQ side decoders","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116284559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2006 IEEE International Conference on Multimedia and Expo
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1