2006 IEEE International Conference on Multimedia and Expo最新文献

英文中文

A Multimedia System for Route Sharing and Video-Based Navigation 一种多媒体路由共享与视频导航系统

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262553

Wen Wu, Jie Yang, Jing Zhang

Trip planning and in-vehicle navigation are crucial tasks for easier and safer driving. The existing navigation systems are based on machine intelligence without allowing human knowledge incorporation. These systems give turn guidance with abstract visual instruction and have not reached the potential of minimizing driver's cognitive load, which is the amount of mental processing power required. In this paper, we describe the development of a multimedia system that makes driving and navigation safer and easier by offering tools for route sharing in trip planning and video-based route guidance during driving. The system provides a multimodal interface for a user to share his/her route with others by drawing on a digital map, naturally incorporating human knowledge into the trip planning process. The system gives driving instructions by overlaying navigational arrows onto live video and providing synthesized voice to reduce the driver's cognitive load, in addition to presenting landmark images for key maneuvers. We describe our observations which had motivated the development of the system, detailed architecture and user interfaces, and finally discusses our initial test findings in the real-road driving context

行程规划和车载导航是实现更轻松、更安全驾驶的关键任务。现有的导航系统以机器智能为基础，不允许人类知识的加入。这些系统以抽象的视觉指令提供转弯引导，并没有达到最小化驾驶员认知负荷的潜力，这是所需的心理处理能力的数量。在本文中，我们描述了一个多媒体系统的开发，该系统通过提供旅行规划中的路线共享和驾驶过程中基于视频的路线引导工具，使驾驶和导航更安全，更容易。该系统提供了一个多模式的界面，用户可以通过在数字地图上绘图来与他人分享自己的路线，自然地将人类的知识融入到行程规划过程中。该系统通过将导航箭头叠加到实时视频上，并提供合成语音来提供驾驶指令，以减少驾驶员的认知负荷，此外还为关键操作提供地标图像。我们描述了我们的观察结果，这些观察结果推动了系统的开发，详细的架构和用户界面，最后讨论了我们在真实道路驾驶环境中的初步测试结果

{"title":"A Multimedia System for Route Sharing and Video-Based Navigation","authors":"Wen Wu, Jie Yang, Jing Zhang","doi":"10.1109/ICME.2006.262553","DOIUrl":"https://doi.org/10.1109/ICME.2006.262553","url":null,"abstract":"Trip planning and in-vehicle navigation are crucial tasks for easier and safer driving. The existing navigation systems are based on machine intelligence without allowing human knowledge incorporation. These systems give turn guidance with abstract visual instruction and have not reached the potential of minimizing driver's cognitive load, which is the amount of mental processing power required. In this paper, we describe the development of a multimedia system that makes driving and navigation safer and easier by offering tools for route sharing in trip planning and video-based route guidance during driving. The system provides a multimodal interface for a user to share his/her route with others by drawing on a digital map, naturally incorporating human knowledge into the trip planning process. The system gives driving instructions by overlaying navigational arrows onto live video and providing synthesized voice to reduce the driver's cognitive load, in addition to presenting landmark images for key maneuvers. We describe our observations which had motivated the development of the system, detailed architecture and user interfaces, and finally discusses our initial test findings in the real-road driving context","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127458748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Optimization of Matching Pursuit Encoder Based on Analytical Approximation of Matching Pursuit Distortion 基于匹配跟踪失真解析逼近的匹配跟踪编码器优化

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262407

A. Shoa, S. Shirani

Distortion of matching pursuit is calculated in terms of matching pursuit encoder parameters for uniformly distributed signals and dictionaries. Then, the MP encoder is optimized using the analytically derived approximation for MP distortion. Our simulation results show that this optimized MP encoder exhibits optimum performance for nonuniform signal and dictionary distributions as well

根据均匀分布信号和字典的匹配跟踪编码器参数计算匹配跟踪失真。然后，利用解析导出的MP失真近似对MP编码器进行优化。仿真结果表明，优化后的MP编码器在非均匀信号和字典分布下也表现出最佳性能

引用次数: 0

Consistent Goal-Directed User Model for Realisitc Man-Machine Task-Oriented Spoken Dialogue Simulation 面向现实人机任务的口语对话模拟一致目标导向用户模型

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262563

O. Pietquin

Because of the great variability of factors to take into account, designing a spoken dialogue system is still a tailoring task. Rapid design and reusability of previous work is made very difficult. For these reasons, the application of machine learning methods to dialogue strategy optimization has become a leading subject of researches this last decade. Yet, techniques such as reinforcement learning are very demanding in training data while obtaining a substantial amount of data in the particular case of spoken dialogues is time-consuming and therefore expansive. In order to expand existing data sets, dialogue simulation techniques are becoming a standard solution. In this paper we describe a user modeling technique for realistic simulation of man-machine goal-directed spoken dialogues. This model, based on a stochastic description of man-machine communication, unlike previously proposed models, is consistent along the interaction according to its history and a predefined user goal

由于需要考虑的因素千差万别，设计口语对话系统仍然是一项裁剪任务。快速设计和重用以前的工作变得非常困难。基于这些原因，机器学习方法在对话策略优化中的应用成为近十年来研究的一个前沿课题。然而，强化学习等技术对训练数据的要求非常高，而在口语对话的特定情况下获得大量数据是耗时的，因此是广泛的。为了扩展现有的数据集，对话模拟技术正在成为一种标准的解决方案。在本文中，我们描述了一种用于人机目标导向口语对话的逼真仿真的用户建模技术。与先前提出的模型不同，该模型基于人机通信的随机描述，根据其历史和预定义的用户目标在交互过程中保持一致

引用次数: 37

SVM-Based Shot Boundary Detection with a Novel Feature 基于支持向量机的镜头边界检测

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262911

Kazunori Matsumoto, Masaki Naito, K. Hoashi, F. Sugaya

This paper describes our new algorithm for shot boundary detection and its evaluation. We adopt a 2-stage data fusion approach with SVM technique to decide whether a boundary exists or not within a given video sequence. This approach is useful to avoid huge feature space problems, even when we adopt many promising features extracted from a video sequence. We also introduce a novel feature to improve detection. The feature consists of two kinds of values extracted from a local frame sequence. One is the image difference between the target frame and that synthesized from the neighbors. The other is the difference between neighbors. This feature can be extracted quickly with a least-square technique. Evaluation of our algorithm is conducted with the TRECVID evaluation framework. Our system obtained a high performance at a shot boundary detection task in TRECVID2005

本文介绍了一种新的镜头边界检测算法及其评价。我们采用两阶段的数据融合方法和支持向量机技术来判断给定视频序列中是否存在边界。这种方法有助于避免巨大的特征空间问题，即使我们从视频序列中提取了许多有前途的特征。我们还引入了一个新的特征来提高检测。该特征由从局部帧序列中提取的两种值组成。一是目标帧与从相邻帧合成的图像之间的差异。另一个是邻居之间的差异。使用最小二乘技术可以快速提取该特征。在TRECVID评估框架下对算法进行了评估。我们的系统在TRECVID2005的镜头边界检测任务中取得了良好的性能

引用次数: 43

Nonlinearly-Adapted Lapped Transforms for Intra-Frame Coding 帧内编码的非线性自适应重叠变换

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262601

D. Lelescu

The use of block transforms for coding intra-frames in video coding may preclude higher coding performance due to residual correlation across block boundaries and insufficient energy compaction, which translates into unrealized rate-distortion gains. Subjectively, the occurrence of blocking artifacts is common. Post-filters and lapped transforms offer good solutions to these problems. Lapped transforms offer a more general framework which can incorporate coordinated pre- and post-filtering operations. Most common are fixed lapped transforms (such as lapped orthogonal transforms), and also transforms with adaptive basis function length. In contrast, in this paper we determine a lapped transform that non-linearly adapts its basis functions to local image statistics and the quantization regime. This transform was incorporated into the H.264/AVC codec, and its performance evaluated. As a result, significant rate-distortion gains of up to 0.45 dB (average 0.35dB) PSNR were obtained compared to the H.264/AVC codec alone

在视频编码中，使用块变换进行帧内编码可能会由于块边界之间的残差相关性和能量压缩不足而妨碍更高的编码性能，这将转化为未实现的速率失真增益。主观上，阻塞工件的出现是常见的。后滤波器和叠置变换为这些问题提供了很好的解决方案。重叠变换提供了一个更通用的框架，它可以结合协调的前滤波和后滤波操作。最常见的是固定重叠变换(如重叠正交变换)，也有自适应基函数长度变换。相反，在本文中，我们确定了一种重叠变换，它的基函数非线性地适应局部图像统计和量化制度。将该变换纳入H.264/AVC编解码器，并对其性能进行了评价。因此，与单独的H.264/AVC编解码器相比，获得了高达0.45 dB(平均0.35dB) PSNR的显着率失真增益

引用次数: 1

Data Embedding in MPEG-1/Audio Layer II Compressed Domain using Side Information 基于边信息的MPEG-1/Audio Layer II压缩域数据嵌入

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262848

Akihiro Matsuoka, Kiyoshi Tanaka, A. Yoneyama, Y. Nakajima

In this work, we propose a data embedding scheme in MPEG-1/audio layer II compressed domain. Data embedding is conducted every AAU by using side information (location of sub-band allocated audio signal) as a data carrier. In general, non-zero signals concentrates in low and middle frequency bands. Therefore we utilize sub-bands that are not allocated audio signal in high frequency bands to embed information. The proposed scheme can increase payload while achieving rewritable (reversible) data, embedding by choosing appropriate parameter. We verify the basic performance of our scheme through computer simulation by using some voice and music signals

在这项工作中，我们提出了一种MPEG-1/音频层压缩域的数据嵌入方案。在每个AAU中，以侧信息(子频带分配音频信号的位置)作为数据载体进行数据嵌入。一般来说，非零信号集中在低频段和中频段。因此，我们利用高频中未分配音频信号的子带来嵌入信息。该方案通过选择合适的参数，在实现可重写(可逆)数据嵌入的同时，增加了有效载荷。通过计算机仿真，利用一些语音和音乐信号验证了该方案的基本性能

引用次数: 1

Person Tracking in Smart Rooms using Dynamic Programming and Adaptive Subspace Learning 基于动态规划和自适应子空间学习的智能房间人员跟踪

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262620

ZhenQiu Zhang, G. Potamianos, Stephen M. Chu, J. Tu, Thomas S. Huang

We present a robust vision system for single person tracking inside a smart room using multiple synchronized, calibrated, stationary cameras. The system consists of two main components, namely initialization and tracking, assisted by an additional component that detects tracking drift. The main novelty lies in the adaptive tracking mechanism that is based on subspace learning of the tracked person appearance in selected two-dimensional camera views. The sub-space is learned on the fly, during tracking, but in contrast to the traditional literature approach, an additional "forgetting" mechanism is introduced, as a means to reduce drifting. The proposed algorithm replaces mean-shift tracking, previously employed in our work. By combining the proposed technique with a robust initialization component that is based on face detection and spatio-temporal dynamic programming, the resulting vision system significantly outperforms previously reported systems for the task of tracking the seminar presenter in data collected as part of the CHIL project

我们提出了一个强大的视觉系统，用于在智能房间内使用多个同步的，校准的，固定的摄像机进行单人跟踪。该系统由两个主要组件组成，即初始化和跟踪，并辅以检测跟踪漂移的附加组件。其主要新颖之处在于自适应跟踪机制，该机制基于在选定的二维摄像机视图中对被跟踪人的外表进行子空间学习。子空间是在跟踪过程中动态学习的，但与传统文献方法不同的是，这里引入了一个额外的“遗忘”机制，作为减少漂移的一种手段。提出的算法取代了之前在我们的工作中使用的mean-shift跟踪。通过将所提出的技术与基于人脸检测和时空动态规划的鲁棒初始化组件相结合，所得到的视觉系统在跟踪作为CHIL项目一部分收集的数据的研讨会主持人任务方面显着优于先前报道的系统

引用次数: 5

On the Detection of Multiplicative Watermarks for Speech Signals in the Wavelet and DCT Domains 小波和DCT域语音信号的乘性水印检测

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262793

R. Eslami, J. Deller, H. Radha

Blind multiplicative watermarking schemes for speech signals using wavelets and discrete cosine transform are presented. Watermarked signals are modeled using a generalized Gaussian distribution (GGD) and Cauchy probability model. Detectors are developed employing generalized likelihood ratio test (GLRT) and locally most powerful (LMP) approach. The LMP scheme is used for the Cauchy distribution, while the GLRT estimates the gain factor as an unknown parameter in the GGD model. The detectors are tested using Monte Carlo simulation and results show the superiority of the proposed LMP/Cauchy detector in some experiments

提出了基于小波和离散余弦变换的语音信号盲乘水印算法。采用广义高斯分布和柯西概率模型对水印信号进行建模。采用广义似然比检验(GLRT)和局部最强大(LMP)方法开发了检测器。LMP方案用于柯西分布，而GLRT估计增益因子作为GGD模型中的未知参数。利用蒙特卡罗模拟对探测器进行了测试，实验结果表明了所提出的LMP/Cauchy探测器的优越性

引用次数: 7

Template-Based Semi-Automatic Profiling of Multimedia Applications 基于模板的多媒体应用程序半自动分析

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262717

C. Poucet, David Atienza Alonso, F. Catthoor

Modern multimedia applications possess a very dynamic use of the memory hierarchy depending on the actual input, therefore requiring run-time profiling techniques to enable optimizations. Because they can contain hundreds of thousands of lines of complex object-oriented specifications, this constitutes a tedious time-consuming task since the addition of profilecode is usually performed manually. In this paper, we present a high-level library-based approach for profiling both statically and dynamically defined variables using templates in C++. Our results in the visual texture coder of the MPEG4 standard show that using the information it provides, we can easily achieve 70.56% energy savings and 19.22% memory access reduction

现代多媒体应用程序对内存层次结构的使用非常动态，这取决于实际输入，因此需要运行时分析技术来实现优化。因为它们可能包含数十万行复杂的面向对象规范，这构成了一项冗长而耗时的任务，因为添加概要代码通常是手动执行的。在本文中，我们提出了一种基于高级库的方法，使用c++中的模板对静态和动态定义的变量进行分析。我们在MPEG4标准的视觉纹理编码器上的结果表明，利用它提供的信息，我们可以轻松地实现70.56%的节能和19.22%的内存访问减少

引用次数: 14

Image Content Clustering and Summarization for Photo Collections 图片集的图像内容聚类和摘要

2006 IEEE International Conference on Multimedia and Expo

Pub Date : 2006-07-09 DOI: 10.1109/ICME.2006.262710

Cheng-Hung Li, Chih-Yi Chiu, Chun-Rong Huang, Chu-Song Chen, Lee-Feng Chien

Rapid growth of digital photography in recent years spurred the need of photo management tools. In this study, we propose an automatic organization framework for photo collections based on image content, so that a novel browsing experience is provided for users. For each photograph, human faces, together with corresponding clothes and nearby regions are located. We extract color histograms of these regions as the image content feature. Then a similarity matrix of a photo collection is generated according to temporal and content features of those photographs. We perform hierarchical clustering based on this matrix, and extract duplicate subjects of a cluster by introducing the contrast context histogram (CCH) technique. The experimental results show that the developed framework provides a promising result for photo management

近年来数码摄影的快速发展刺激了对照片管理工具的需求。在本研究中，我们提出了一种基于图像内容的图片集自动组织框架，为用户提供一种新颖的浏览体验。对于每张照片，人脸以及相应的衣服和附近区域都被定位。我们提取这些区域的颜色直方图作为图像的内容特征。然后根据照片的时间特征和内容特征生成照片集的相似矩阵。在此基础上进行分层聚类，并通过引入对比上下文直方图(CCH)技术提取聚类中的重复主题。实验结果表明，所开发的框架为照片管理提供了良好的效果

引用次数: 16

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2006 IEEE International Conference on Multimedia and Expo

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀