首页 > 最新文献

2006 IEEE International Conference on Multimedia and Expo最新文献

英文 中文
High Performance Fractional Motion Estimation and Mode Decision for H.264/AVC H.264/AVC的高性能分数运动估计和模式决策
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262762
Chao-Yang Kao, Huang-Chih Kuo, Y. Lin
We propose a high performance architecture for fractional motion estimation and Lagrange mode decision in H.264/AVC. Instead of time-consuming fractional-pixel interpolation and secondary search, our fractional motion estimator employees a mathematical model to estimate SADs at quarter-pixel position. Both computation time and memory access requirements are greatly reduced without significant quality degradation. We propose a novel cost function for mode decision that leads to much better performance than traditional low complexity method. Synthesized into a TSMC 0.13 mum CMOS technology, our design takes 56 k gates at 100 MHz and is sufficient to process QUXGA (3200times2400) video sequences at 30 frames per second (fps). Compared with a state-of-the-art design operating under the same frequency, ours is 30% smaller and has 18 times more throughput at the expense of only 0.05 db in PSNR difference
提出了H.264/AVC中分数阶运动估计和拉格朗日模式判定的高性能体系结构。我们的分数运动估计器采用数学模型来估计四分之一像素位置的SADs,而不是耗时的分数像素插值和二次搜索。计算时间和内存访问需求都大大降低,而质量没有明显下降。我们提出了一种新的模式决策成本函数,它比传统的低复杂度方法具有更好的性能。我们的设计采用台积电0.13 mum CMOS技术合成,在100 MHz下使用56 k门,足以以每秒30帧(fps)的速度处理QUXGA(3200次2400)视频序列。与在相同频率下工作的最先进的设计相比,我们的设计体积缩小了30%,吞吐量提高了18倍,而PSNR差异仅为0.05 db
{"title":"High Performance Fractional Motion Estimation and Mode Decision for H.264/AVC","authors":"Chao-Yang Kao, Huang-Chih Kuo, Y. Lin","doi":"10.1109/ICME.2006.262762","DOIUrl":"https://doi.org/10.1109/ICME.2006.262762","url":null,"abstract":"We propose a high performance architecture for fractional motion estimation and Lagrange mode decision in H.264/AVC. Instead of time-consuming fractional-pixel interpolation and secondary search, our fractional motion estimator employees a mathematical model to estimate SADs at quarter-pixel position. Both computation time and memory access requirements are greatly reduced without significant quality degradation. We propose a novel cost function for mode decision that leads to much better performance than traditional low complexity method. Synthesized into a TSMC 0.13 mum CMOS technology, our design takes 56 k gates at 100 MHz and is sufficient to process QUXGA (3200times2400) video sequences at 30 frames per second (fps). Compared with a state-of-the-art design operating under the same frequency, ours is 30% smaller and has 18 times more throughput at the expense of only 0.05 db in PSNR difference","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132680272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Tensor-Based Multiple Object Trajectory Indexing and Retrieval 基于张量的多目标轨迹索引与检索
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262468
Xiang Ma, F. Bashir, A. Khokhar, D. Schonfeld
This paper presents novel tensor-based object trajectory modelling techniques for simultaneous representation of multiple objects motion trajectories in a content based indexing and retrieval framework. Three different tensor decomposition techniques-PARAFAC, HOSVD and multiple-SVD-are explored to achieve this goal with the aim of using a minimum set of coefficients and data-dependant bases. These tensor decompositions have been applied to represent full as well as segmented trajectories. Our simulation results show that the PARAFAC-based representation provides higher compression ratio, superior precision-recall metrics, and smaller query processing time compared to the other tensor-based approaches
本文提出了一种新的基于张量的物体轨迹建模技术,用于在基于内容的索引和检索框架中同时表示多个物体的运动轨迹。为了实现这一目标,研究了三种不同的张量分解技术——parafac、HOSVD和multiple- svd,目的是使用最小的系数集和数据依赖库。这些张量分解已经被应用于表示完整的和分段的轨迹。我们的仿真结果表明,与其他基于张量的方法相比,基于parafac的表示提供了更高的压缩比、更好的查准率指标和更短的查询处理时间
{"title":"Tensor-Based Multiple Object Trajectory Indexing and Retrieval","authors":"Xiang Ma, F. Bashir, A. Khokhar, D. Schonfeld","doi":"10.1109/ICME.2006.262468","DOIUrl":"https://doi.org/10.1109/ICME.2006.262468","url":null,"abstract":"This paper presents novel tensor-based object trajectory modelling techniques for simultaneous representation of multiple objects motion trajectories in a content based indexing and retrieval framework. Three different tensor decomposition techniques-PARAFAC, HOSVD and multiple-SVD-are explored to achieve this goal with the aim of using a minimum set of coefficients and data-dependant bases. These tensor decompositions have been applied to represent full as well as segmented trajectories. Our simulation results show that the PARAFAC-based representation provides higher compression ratio, superior precision-recall metrics, and smaller query processing time compared to the other tensor-based approaches","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123161332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On the Potential of Incorporating Knowledge of Human Visual Attention into Cbir Systems 关于将人类视觉注意知识纳入cir系统的潜力
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262953
Oge Marques, Liam M. Mayron, G. Borba, H. Gamba
Content-based image retrieval (CBIR) systems have been actively investigated over the past decade. Several existing CBIR prototypes claim to be designed based on perceptual characteristics of the human visual system, but even those who do are far from recognizing that they could benefit further by incorporating ongoing research in vision science. This paper explores the inclusion of human visual perception knowledge into the design and implementation of CBIR systems. Particularly, it addresses the latest developments in computational modeling of human visual attention. This fresh way of revisiting concepts in CBIR based on the latest findings and open questions in vision science research has the potential to overcome some of the challenges faced by CBIR systems
在过去的十年中,基于内容的图像检索(CBIR)系统得到了积极的研究。一些现有的CBIR原型声称是基于人类视觉系统的感知特征设计的,但即使是那些这样做的人也远远没有认识到,通过结合视觉科学的正在进行的研究,他们可以进一步受益。本文探讨了将人类视觉感知知识纳入CBIR系统的设计和实现。特别是,它解决了人类视觉注意的计算建模的最新发展。基于视觉科学研究的最新发现和开放问题,这种重新审视CBIR概念的新方式有可能克服CBIR系统面临的一些挑战
{"title":"On the Potential of Incorporating Knowledge of Human Visual Attention into Cbir Systems","authors":"Oge Marques, Liam M. Mayron, G. Borba, H. Gamba","doi":"10.1109/ICME.2006.262953","DOIUrl":"https://doi.org/10.1109/ICME.2006.262953","url":null,"abstract":"Content-based image retrieval (CBIR) systems have been actively investigated over the past decade. Several existing CBIR prototypes claim to be designed based on perceptual characteristics of the human visual system, but even those who do are far from recognizing that they could benefit further by incorporating ongoing research in vision science. This paper explores the inclusion of human visual perception knowledge into the design and implementation of CBIR systems. Particularly, it addresses the latest developments in computational modeling of human visual attention. This fresh way of revisiting concepts in CBIR based on the latest findings and open questions in vision science research has the potential to overcome some of the challenges faced by CBIR systems","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131422303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Automatic Classification System Applied in Medical Images 应用于医学图像的自动分类系统
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262713
B. Qiu, Chang Xu, Q. Tian
In this paper, a multi-class classification system is developed for medical images. We have mainly explored ways to use different image features, and compared two classifiers: principle component analysis (PCA) and supporting vector machines (SVM) with RBF (radial basis functions) kernels. Experimental results showed that SVM with a combination of the middle-level blob feature and low-level features (down-scaled images and their texture maps) achieved the highest recognition accuracy. Using the 9000 given training images from ImageCLEFOS, our proposed method has achieved a recognition rate of 88.9% in a simulation experiment. And according to the evaluation result from the ImageCLEFOS organizer, our method has achieved a recognition rate of 82% over its 1000 testing images
本文开发了一种医学图像的多类分类系统。我们主要探索了使用不同图像特征的方法,并比较了两种分类器:主成分分析(PCA)和支持向量机(SVM)与RBF(径向基函数)核。实验结果表明,结合中级blob特征和低级特征(降尺度图像及其纹理图)的SVM识别准确率最高。在ImageCLEFOS给出的9000张训练图像的模拟实验中,我们提出的方法达到了88.9%的识别率。根据ImageCLEFOS组织者的评估结果,我们的方法在1000张测试图像中达到了82%的识别率
{"title":"An Automatic Classification System Applied in Medical Images","authors":"B. Qiu, Chang Xu, Q. Tian","doi":"10.1109/ICME.2006.262713","DOIUrl":"https://doi.org/10.1109/ICME.2006.262713","url":null,"abstract":"In this paper, a multi-class classification system is developed for medical images. We have mainly explored ways to use different image features, and compared two classifiers: principle component analysis (PCA) and supporting vector machines (SVM) with RBF (radial basis functions) kernels. Experimental results showed that SVM with a combination of the middle-level blob feature and low-level features (down-scaled images and their texture maps) achieved the highest recognition accuracy. Using the 9000 given training images from ImageCLEFOS, our proposed method has achieved a recognition rate of 88.9% in a simulation experiment. And according to the evaluation result from the ImageCLEFOS organizer, our method has achieved a recognition rate of 82% over its 1000 testing images","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Semantic Segmentation of Documentary Video using Music Breaks 基于音乐片段的纪录片语义分割
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262908
Aijuan Dong, Honglin Li
Many documentary videos use background music to help structure the content and communicate the semantic. In this paper, we investigate semantic segmentation of documentary video using music breaks. We first define video semantic units based on the speech text that a video/audio contains, and then propose a three-step procedure for semantic video segmentation using music breaks. Since the music breaks of a documentary video are of different semantic levels, we also study how different speech/music segment lengths correlate with the semantic level of a music break. Our experimental results show that music breaks can effectively segment a continuous documentary video stream into semantic units with an average F-score of 0.91 and the lengths of combined segments (speech segment plus the music segment that follows) strongly correlate with the semantic levels of music breaks
许多纪录片视频使用背景音乐来帮助构建内容和传达语义。在本文中,我们研究了使用音乐中断的纪录片视频的语义分割。我们首先根据视频/音频包含的语音文本定义视频语义单元,然后提出了一个使用音乐中断进行语义视频分割的三步程序。由于纪录片的音乐中断具有不同的语义水平,我们也研究了不同的语音/音乐片段长度如何与音乐中断的语义水平相关。我们的实验结果表明,音乐中断可以有效地将连续的纪录片视频流分割成平均f值为0.91的语义单元,并且组合片段的长度(语音片段加上随后的音乐片段)与音乐中断的语义水平密切相关
{"title":"Semantic Segmentation of Documentary Video using Music Breaks","authors":"Aijuan Dong, Honglin Li","doi":"10.1109/ICME.2006.262908","DOIUrl":"https://doi.org/10.1109/ICME.2006.262908","url":null,"abstract":"Many documentary videos use background music to help structure the content and communicate the semantic. In this paper, we investigate semantic segmentation of documentary video using music breaks. We first define video semantic units based on the speech text that a video/audio contains, and then propose a three-step procedure for semantic video segmentation using music breaks. Since the music breaks of a documentary video are of different semantic levels, we also study how different speech/music segment lengths correlate with the semantic level of a music break. Our experimental results show that music breaks can effectively segment a continuous documentary video stream into semantic units with an average F-score of 0.91 and the lengths of combined segments (speech segment plus the music segment that follows) strongly correlate with the semantic levels of music breaks","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124339961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking 基于词法查询扩展和模型重排序的语义多媒体检索
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262892
A. Haubold, A. Natsev, M. Naphade
We present methods for improving text search retrieval of visual multimedia content by applying a set of visual models of semantic concepts from a lexicon of concepts deemed relevant for the collection. Text search is performed via queries of words or fully qualified sentences, and results are returned in the form of ranked video clips. Our approach involves a query expansion stage, in which query terms are compared to the visual concepts for which we independently build classifier models. We leverage a synonym dictionary and WordNet similarities during expansion. Results over each query are aggregated across the expanded terms and ranked. We validate our approach on the TRECVID 2005 broadcast news data with 39 concepts specifically designed for this genre of video. We observe that concept models improve search results by nearly 50% after model-based re-ranking of text-only search. We also observe that purely model-based retrieval significantly outperforms text-based retrieval on non-named entity queries
我们提出了改进视觉多媒体内容的文本搜索检索的方法,通过应用一组语义概念的视觉模型,这些模型来自被认为与集合相关的概念词典。文本搜索通过查询单词或完全限定的句子来执行,结果以排名视频片段的形式返回。我们的方法涉及一个查询扩展阶段,在这个阶段,查询项与我们独立构建分类器模型的视觉概念进行比较。在扩展过程中,我们利用同义词字典和WordNet相似性。每个查询的结果都跨扩展的术语聚合并排序。我们用39个专门为这类视频设计的概念在TRECVID 2005广播新闻数据上验证了我们的方法。我们观察到,在对纯文本搜索进行基于模型的重新排序后,概念模型将搜索结果提高了近50%。我们还观察到,在非命名实体查询上,纯粹基于模型的检索明显优于基于文本的检索
{"title":"Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking","authors":"A. Haubold, A. Natsev, M. Naphade","doi":"10.1109/ICME.2006.262892","DOIUrl":"https://doi.org/10.1109/ICME.2006.262892","url":null,"abstract":"We present methods for improving text search retrieval of visual multimedia content by applying a set of visual models of semantic concepts from a lexicon of concepts deemed relevant for the collection. Text search is performed via queries of words or fully qualified sentences, and results are returned in the form of ranked video clips. Our approach involves a query expansion stage, in which query terms are compared to the visual concepts for which we independently build classifier models. We leverage a synonym dictionary and WordNet similarities during expansion. Results over each query are aggregated across the expanded terms and ranked. We validate our approach on the TRECVID 2005 broadcast news data with 39 concepts specifically designed for this genre of video. We observe that concept models improve search results by nearly 50% after model-based re-ranking of text-only search. We also observe that purely model-based retrieval significantly outperforms text-based retrieval on non-named entity queries","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114833969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
A Novel Resynchronization Method for Scalable Video Over Wireless Channel 一种基于无线信道的可扩展视频重同步方法
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262869
Yu Wang, Lap-Pui Chau, Kim-Hui Yap
A scalable video coder generates scalable compressed bit-stream, which can provide different types of scalability depend on different requirements. This paper proposes a novel resynchronization method for the scalable video with combined temporal and quality (SNR) scalability. The main purpose is to improve the robustness of the transmitted video. In the proposed scheme, the video is encoded into scalable compressed bit-stream with combined temporal and quality scalability. The significance of each enhancement layer unit is estimated properly. A novel resynchronization method is proposed where joint group of picture (GOP) level and picture level insertion of resynchronization marker approach is applied to insert different amount of resynchronization markers in different enhancement layer units for reliable transmission of the video over error-prone channels. It is demonstrated from the experimental results that the proposed method can perform graceful degradation under a variety of error conditions and the improvement can be up to 1 dB compared with the conventional method
可伸缩的视频编码器生成可伸缩的压缩比特流,可以根据不同的需求提供不同类型的可扩展性。提出了一种具有时间和质量(信噪比)相结合可扩展性的视频重同步方法。其主要目的是提高传输视频的鲁棒性。在该方案中,视频被编码成可伸缩的压缩比特流,具有时间和质量的可扩展性。适当地估计了各增强层单元的重要性。提出了一种新的重同步方法,该方法采用图像级联合组(GOP)和图像级插入重同步标记的方法,在不同的增强层单元中插入不同数量的重同步标记,以保证视频在易出错信道上的可靠传输。实验结果表明,该方法在各种误差条件下都能实现优雅的退化,与传统方法相比,改进幅度可达1 dB
{"title":"A Novel Resynchronization Method for Scalable Video Over Wireless Channel","authors":"Yu Wang, Lap-Pui Chau, Kim-Hui Yap","doi":"10.1109/ICME.2006.262869","DOIUrl":"https://doi.org/10.1109/ICME.2006.262869","url":null,"abstract":"A scalable video coder generates scalable compressed bit-stream, which can provide different types of scalability depend on different requirements. This paper proposes a novel resynchronization method for the scalable video with combined temporal and quality (SNR) scalability. The main purpose is to improve the robustness of the transmitted video. In the proposed scheme, the video is encoded into scalable compressed bit-stream with combined temporal and quality scalability. The significance of each enhancement layer unit is estimated properly. A novel resynchronization method is proposed where joint group of picture (GOP) level and picture level insertion of resynchronization marker approach is applied to insert different amount of resynchronization markers in different enhancement layer units for reliable transmission of the video over error-prone channels. It is demonstrated from the experimental results that the proposed method can perform graceful degradation under a variety of error conditions and the improvement can be up to 1 dB compared with the conventional method","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"564 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116285907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Design and Implementation of a Multimedia Personalized Service Over Large Scale Networks 大型网络上多媒体个性化服务的设计与实现
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262554
Xiaorong Li, T. Hung, B. Veeravalli
In this paper, we proposed to setup a distributed multimedia system which aggregates the capacity of multiple servers to provide customized multimedia services in a cost-effective way. Such a system enables clients to customize their services by specifying the service delay or the viewing times. We developed an experimental prototype in which media servers can cooperate in streams caching, replication and distribution. We applied a variety of stream distribution algorithms to the system and studied their performance under the real-life situations with limited network resources and varying request arrival pattern. The results show such a system can provide cost-effective services and be applied to practical environments
在本文中,我们提出了一个分布式多媒体系统,聚合多台服务器的能力,以一种经济有效的方式提供定制的多媒体服务。该系统允许客户通过指定服务延迟或观看时间来定制他们的服务。我们开发了一个实验原型,其中媒体服务器可以在流缓存、复制和分发方面进行合作。我们将各种流分配算法应用到系统中,并研究了它们在网络资源有限和不同请求到达模式的实际情况下的性能。结果表明,该系统能够提供高性价比的服务,并可应用于实际环境
{"title":"Design and Implementation of a Multimedia Personalized Service Over Large Scale Networks","authors":"Xiaorong Li, T. Hung, B. Veeravalli","doi":"10.1109/ICME.2006.262554","DOIUrl":"https://doi.org/10.1109/ICME.2006.262554","url":null,"abstract":"In this paper, we proposed to setup a distributed multimedia system which aggregates the capacity of multiple servers to provide customized multimedia services in a cost-effective way. Such a system enables clients to customize their services by specifying the service delay or the viewing times. We developed an experimental prototype in which media servers can cooperate in streams caching, replication and distribution. We applied a variety of stream distribution algorithms to the system and studied their performance under the real-life situations with limited network resources and varying request arrival pattern. The results show such a system can provide cost-effective services and be applied to practical environments","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116353231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking 基于视觉头部跟踪的动态贝叶斯网络会话场景分析
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262677
K. Otsuka, Junji Yamato, Y. Takemae, H. Murase
A novel method based on a probabilistic model for conversation scene analysis is proposed that can infer conversation structure from video sequences of face-to-face communication. Conversation structure represents the type of conversation such as monologue or dialogue, and can indicate who is talking/listening to whom. This study assumes that the gaze directions of participants provide cues for discerning the conversation structure, and can be identified from head directions. For measuring head directions, the proposed method newly employs a visual head tracker based on sparse-template condensation. The conversation model is built on a dynamic Bayesian network and is used to estimate the conversation structure and gaze directions from observed head directions and utterances. Visual tracking is conventionally thought to be less reliable than contact sensors, but experiments confirm that the proposed method achieves almost comparable performance in estimating gaze directions and conversation structure to a conventional sensor-based method
提出了一种基于概率模型的对话场景分析方法,从面对面交流的视频序列中推断对话结构。会话结构代表对话的类型,如独白或对话,可以表明谁在说话/听谁说话。本研究假设参与者的凝视方向为识别会话结构提供线索,并且可以通过头部方向来识别。对于头部方向的测量,该方法采用了一种基于稀疏模板凝聚的视觉头部跟踪器。该会话模型建立在一个动态贝叶斯网络上,用于从观察到的头部方向和话语中估计会话结构和凝视方向。视觉跟踪通常被认为不如接触式传感器可靠,但实验证实,该方法在估计凝视方向和会话结构方面的性能几乎与传统的基于传感器的方法相当
{"title":"Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking","authors":"K. Otsuka, Junji Yamato, Y. Takemae, H. Murase","doi":"10.1109/ICME.2006.262677","DOIUrl":"https://doi.org/10.1109/ICME.2006.262677","url":null,"abstract":"A novel method based on a probabilistic model for conversation scene analysis is proposed that can infer conversation structure from video sequences of face-to-face communication. Conversation structure represents the type of conversation such as monologue or dialogue, and can indicate who is talking/listening to whom. This study assumes that the gaze directions of participants provide cues for discerning the conversation structure, and can be identified from head directions. For measuring head directions, the proposed method newly employs a visual head tracker based on sparse-template condensation. The conversation model is built on a dynamic Bayesian network and is used to estimate the conversation structure and gaze directions from observed head directions and utterances. Visual tracking is conventionally thought to be less reliable than contact sensors, but experiments confirm that the proposed method achieves almost comparable performance in estimating gaze directions and conversation structure to a conventional sensor-based method","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123445965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
A Robust Imagewatermarking Scheme Based on the Alpha-Beta Space 一种基于Alpha-Beta空间的鲁棒图像水印方案
Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262847
P. Martins, P. Carvalho
A robust image watermarking scheme relying on an affine invariant embedding domain is presented. The invariant space is obtained by triangulating the image using affine invariant interest points as vertices and performing an invariant triangle representation with respect to affine transformations based on the barycentric coordinates system. The watermark is encoded via quantization index modulation with an adaptive quantization step
提出了一种基于仿射不变嵌入域的鲁棒图像水印方案。不变空间是利用仿射不变兴趣点作为顶点对图像进行三角化,并基于质心坐标系对仿射变换进行不变三角形表示得到的。水印通过自适应量化步骤的量化指标调制进行编码
{"title":"A Robust Imagewatermarking Scheme Based on the Alpha-Beta Space","authors":"P. Martins, P. Carvalho","doi":"10.1109/ICME.2006.262847","DOIUrl":"https://doi.org/10.1109/ICME.2006.262847","url":null,"abstract":"A robust image watermarking scheme relying on an affine invariant embedding domain is presented. The invariant space is obtained by triangulating the image using affine invariant interest points as vertices and performing an invariant triangle representation with respect to affine transformations based on the barycentric coordinates system. The watermark is encoded via quantization index modulation with an adaptive quantization step","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122041362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2006 IEEE International Conference on Multimedia and Expo
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1