首页 > 最新文献

2012 IEEE International Symposium on Multimedia最新文献

英文 中文
Using Low Level Gradient Channels for Computationally Efficient Object Detection and Its Application in Logo Detection 基于低梯度通道的高效目标检测及其在Logo检测中的应用
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.51
Yu Chen, V. Thing
We propose a logo detection approach which utilizes the Haar (Haar-like) features computed directly from the gradient orientation, gradient magnitude channels and the gray intensity channel to effectively and efficiently extract discriminating features for a variety of logo images. The major contributions of this work are two-fold: 1) we explicitly demonstrate that, with an optimized design and implementation, the considerable discrimination can be obtained from the simple features like the Haar features which are extracted directly from the low level gradient orientation and magnitude channels, 2) we proposed an effective and efficient logo detection approach by using the Haar features obtained directly from gradient orientation, magnitude, and gray image channels. The experimental results on the collected merchandise images of Louis Vuitton (LV) and Polo Ralph Lauren (PRL) products show promising applicabilities of our approach.
本文提出了一种利用梯度方向、梯度幅度通道和灰度强度通道直接计算Haar(类Haar)特征的标志检测方法,对各种标志图像进行有效、高效的识别特征提取。本工作的主要贡献有两个方面:1)我们明确地证明,通过优化设计和实现,可以从直接从低级梯度方向和大小通道中提取的Haar特征等简单特征中获得相当大的识别能力;2)我们提出了一种有效的、高效的标识检测方法,利用直接从梯度方向、大小和灰度图像通道中获得的Haar特征。在LV (LV)和Polo Ralph Lauren (PRL)所收集的商品图像上的实验结果表明,我们的方法具有良好的适用性。
{"title":"Using Low Level Gradient Channels for Computationally Efficient Object Detection and Its Application in Logo Detection","authors":"Yu Chen, V. Thing","doi":"10.1109/ISM.2012.51","DOIUrl":"https://doi.org/10.1109/ISM.2012.51","url":null,"abstract":"We propose a logo detection approach which utilizes the Haar (Haar-like) features computed directly from the gradient orientation, gradient magnitude channels and the gray intensity channel to effectively and efficiently extract discriminating features for a variety of logo images. The major contributions of this work are two-fold: 1) we explicitly demonstrate that, with an optimized design and implementation, the considerable discrimination can be obtained from the simple features like the Haar features which are extracted directly from the low level gradient orientation and magnitude channels, 2) we proposed an effective and efficient logo detection approach by using the Haar features obtained directly from gradient orientation, magnitude, and gray image channels. The experimental results on the collected merchandise images of Louis Vuitton (LV) and Polo Ralph Lauren (PRL) products show promising applicabilities of our approach.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129760482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Scene Generation by Learning from Examples 从例子中学习3D场景生成
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.19
Mesfin Dema, H. Sari-Sarraf
Due to overwhelming use of 3D models in video games and virtual environments, there is a growing interest in 3D scene generation, scene understanding and 3D model retrieval. In this paper, we introduce a data-driven 3D scene generation approach from a Maximum Entropy (MaxEnt) model selection perspective. Using this model selection criterion, new scenes can be sampled by matching a set of contextual constraints that are extracted from training and synthesized scenes. Starting from a set of random synthesized configurations of objects in 3D, the MaxEnt distribution is iteratively sampled (using Metropolis sampling) and updated until the constraints between training and synthesized scenes match, indicating the generation of plausible synthesized 3D scenes. To illustrate the proposed methodology, we use 3D training desk scenes that are all composed of seven predefined objects with different position, scale and orientation arrangements. After applying the MaxEnt framework, the synthesized scenes show that the proposed strategy can generate reasonably similar scenes to the training examples without any human supervision during sampling. We would like to mention, however, that such an approach is not limited to desk scene generation as described here and can be extended to any 3D scene generation problem.
由于在视频游戏和虚拟环境中大量使用3D模型,人们对3D场景生成、场景理解和3D模型检索的兴趣越来越大。在本文中,我们从最大熵(MaxEnt)模型选择的角度介绍了一种数据驱动的3D场景生成方法。使用该模型选择标准,可以通过匹配从训练和合成场景中提取的一组上下文约束来对新场景进行采样。从一组三维物体的随机合成配置开始,迭代采样MaxEnt分布(使用Metropolis采样)并更新,直到训练和合成场景之间的约束匹配,表示生成可信的合成3D场景。为了说明所提出的方法,我们使用3D训练桌场景,这些场景都由七个具有不同位置,规模和方向安排的预定义对象组成。应用MaxEnt框架后,合成的场景表明,该策略在采样过程中无需人工监督即可生成与训练样例相当相似的场景。然而,我们想要提到的是,这种方法并不局限于这里描述的桌面场景生成,而且可以扩展到任何3D场景生成问题。
{"title":"3D Scene Generation by Learning from Examples","authors":"Mesfin Dema, H. Sari-Sarraf","doi":"10.1109/ISM.2012.19","DOIUrl":"https://doi.org/10.1109/ISM.2012.19","url":null,"abstract":"Due to overwhelming use of 3D models in video games and virtual environments, there is a growing interest in 3D scene generation, scene understanding and 3D model retrieval. In this paper, we introduce a data-driven 3D scene generation approach from a Maximum Entropy (MaxEnt) model selection perspective. Using this model selection criterion, new scenes can be sampled by matching a set of contextual constraints that are extracted from training and synthesized scenes. Starting from a set of random synthesized configurations of objects in 3D, the MaxEnt distribution is iteratively sampled (using Metropolis sampling) and updated until the constraints between training and synthesized scenes match, indicating the generation of plausible synthesized 3D scenes. To illustrate the proposed methodology, we use 3D training desk scenes that are all composed of seven predefined objects with different position, scale and orientation arrangements. After applying the MaxEnt framework, the synthesized scenes show that the proposed strategy can generate reasonably similar scenes to the training examples without any human supervision during sampling. We would like to mention, however, that such an approach is not limited to desk scene generation as described here and can be extended to any 3D scene generation problem.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122071912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Quantifying the Makeup Effect in Female Faces and Its Applications for Age Estimation 女性面部化妆效果的量化及其在年龄估计中的应用
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.29
Ranran Feng, B. Prabhakaran
In this paper, a comprehensive statistical study of makeup effect on facial parts (skin, eyes, and lip) is conducted first. According to the statistical study, a method to detect whether makeup is applied or not based on input facial image is proposed, then the makeup effect is further quantified as Young Index (YI) for female age estimation. An age estimator with makeup effect considered is presented in this paper. Results from the experiments find that with the makeup effect considered, the method proposed in this paper can improve accuracy by 0.9-6.7% in CS (Cumulative Score) and 0.26-9.76 in MAE (Mean of Absolute Errors between the estimated age and the ground truth age labeled or acquired from the data) comparing with other age estimation methods.
本文首先对化妆对面部部位(皮肤、眼睛和嘴唇)的影响进行了全面的统计研究。在统计研究的基础上,提出了一种基于输入的面部图像来检测是否化妆的方法,然后将化妆效果进一步量化为女性年龄估计的年轻指数(Young Index, YI)。提出了一种考虑补强效应的年龄估计方法。实验结果表明,在考虑弥补效应的情况下,与其他年龄估计方法相比,本文方法的CS (Cumulative Score)和MAE(估计年龄与标记或从数据中获得的真实年龄之间的绝对误差均值)的准确率分别提高了0.9-6.7%和0.26-9.76。
{"title":"Quantifying the Makeup Effect in Female Faces and Its Applications for Age Estimation","authors":"Ranran Feng, B. Prabhakaran","doi":"10.1109/ISM.2012.29","DOIUrl":"https://doi.org/10.1109/ISM.2012.29","url":null,"abstract":"In this paper, a comprehensive statistical study of makeup effect on facial parts (skin, eyes, and lip) is conducted first. According to the statistical study, a method to detect whether makeup is applied or not based on input facial image is proposed, then the makeup effect is further quantified as Young Index (YI) for female age estimation. An age estimator with makeup effect considered is presented in this paper. Results from the experiments find that with the makeup effect considered, the method proposed in this paper can improve accuracy by 0.9-6.7% in CS (Cumulative Score) and 0.26-9.76 in MAE (Mean of Absolute Errors between the estimated age and the ground truth age labeled or acquired from the data) comparing with other age estimation methods.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124591783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Motion-Sketch Based Video Retrieval Using MST-CSS Representation 基于MST-CSS表示的运动草图视频检索
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.76
C. Chattopadhyay, Sukhendu Das
In this work, we propose a framework for a robust Content Based Video Retrieval (CBVR) system with free hand query sketches, using the Multi-Spectro Temporal-Curvature Scale Space (MST-CSS) representation. Our designed interface allows sketches to be drawn to depict the shape of the object in motion and its trajectory. We obtain the MST-CSS feature representation using these cues and match with a set of MST-CSS features generated offline from the video clips in the database (gallery). Results are displayed in rank ordered similarity. Experimentation with benchmark datasets shows promising results.
在这项工作中,我们提出了一个鲁棒的基于内容的视频检索(CBVR)系统框架,该系统使用多光谱时间曲率尺度空间(MST-CSS)表示。我们设计的界面允许绘制草图来描绘运动中的物体的形状及其轨迹。我们使用这些线索获得MST-CSS特征表示,并与数据库(库)中的视频片段离线生成的一组MST-CSS特征进行匹配。结果按相似度排序显示。对基准数据集的实验显示了有希望的结果。
{"title":"A Motion-Sketch Based Video Retrieval Using MST-CSS Representation","authors":"C. Chattopadhyay, Sukhendu Das","doi":"10.1109/ISM.2012.76","DOIUrl":"https://doi.org/10.1109/ISM.2012.76","url":null,"abstract":"In this work, we propose a framework for a robust Content Based Video Retrieval (CBVR) system with free hand query sketches, using the Multi-Spectro Temporal-Curvature Scale Space (MST-CSS) representation. Our designed interface allows sketches to be drawn to depict the shape of the object in motion and its trajectory. We obtain the MST-CSS feature representation using these cues and match with a set of MST-CSS features generated offline from the video clips in the database (gallery). Results are displayed in rank ordered similarity. Experimentation with benchmark datasets shows promising results.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126537211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mutual Information Based Stereo Correspondence in Extreme Cases 极端情况下基于互信息的立体对应
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.46
Qing Tian, GuangJun Tian
Stereo correspondence is an ill-posed problem mainly due to matching ambiguity, which is especially serious in extreme cases where the corresponding relationship is unknown and can be very complicated. Mutual information (MI), which assumes no prior relationship on the matching pair, is a good solution to this problem. This paper proposes a context-aware mutual information and Markov Random Field (MRF) based approach with gradient information introduced into both the data term and the smoothness term of the MAP-MRF framework where such advanced techniques as graph cuts can be used to find an accurate disparity map. The results show that the proposed context-aware method outperforms non-MI and traditional MI-based methods both quantitatively and qualitatively in some extreme cases.
立体对应是一个病态问题,其主要原因是匹配的模糊性,在匹配关系未知且非常复杂的极端情况下,这种问题尤为严重。互信息(MI)是一种很好的解决方案,它假定匹配对之间没有先验关系。本文提出了一种基于上下文感知互信息和马尔可夫随机场(MRF)的方法,在map -MRF框架的数据项和平滑项中引入梯度信息,利用图切等先进技术找到精确的视差图。结果表明,在某些极端情况下,本文提出的上下文感知方法在数量和质量上都优于非mi和传统的基于mi的方法。
{"title":"Mutual Information Based Stereo Correspondence in Extreme Cases","authors":"Qing Tian, GuangJun Tian","doi":"10.1109/ISM.2012.46","DOIUrl":"https://doi.org/10.1109/ISM.2012.46","url":null,"abstract":"Stereo correspondence is an ill-posed problem mainly due to matching ambiguity, which is especially serious in extreme cases where the corresponding relationship is unknown and can be very complicated. Mutual information (MI), which assumes no prior relationship on the matching pair, is a good solution to this problem. This paper proposes a context-aware mutual information and Markov Random Field (MRF) based approach with gradient information introduced into both the data term and the smoothness term of the MAP-MRF framework where such advanced techniques as graph cuts can be used to find an accurate disparity map. The results show that the proposed context-aware method outperforms non-MI and traditional MI-based methods both quantitatively and qualitatively in some extreme cases.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126438175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tag Cloud++ - Scalable Tag Clouds for Arbitrary Layouts 标签云++ -可扩展的标签云任意布局
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.66
Minwoo Park, D. Joshi, A. Loui
Tag-clouds are becoming extremely popular in multimedia community as media of exploration and expression. In this work, we take tag-cloud construction to a new level by allowing a tag-cloud to take any arbitrary shape while preserving some order of tags (here alphabetical). Our method guarantees non-overlap among words and ensures compact representation within specified shape. The experiments on a variety of input set of tags and shapes of the tag-clouds show that the proposed method is promising and has real-time performance. Finally, we show the applicability of our method with an application wherein the tag-clouds specific to places, people, and keywords are constructed and used for digital media selection within a social network domain.
标签云作为一种探索和表达的媒介,在多媒体社区中日益流行。在这项工作中,我们将标签云的构建提升到一个新的水平,允许标签云采取任意形状,同时保留一些标签的顺序(这里是字母顺序)。我们的方法保证了单词之间的不重叠,并保证了在指定形状内的紧凑表示。在各种标签输入集和标签云形状上的实验表明,该方法具有良好的实时性。最后,我们通过一个应用程序展示了我们的方法的适用性,其中构建了特定于地点、人物和关键字的标签云,并将其用于社交网络域内的数字媒体选择。
{"title":"Tag Cloud++ - Scalable Tag Clouds for Arbitrary Layouts","authors":"Minwoo Park, D. Joshi, A. Loui","doi":"10.1109/ISM.2012.66","DOIUrl":"https://doi.org/10.1109/ISM.2012.66","url":null,"abstract":"Tag-clouds are becoming extremely popular in multimedia community as media of exploration and expression. In this work, we take tag-cloud construction to a new level by allowing a tag-cloud to take any arbitrary shape while preserving some order of tags (here alphabetical). Our method guarantees non-overlap among words and ensures compact representation within specified shape. The experiments on a variety of input set of tags and shapes of the tag-clouds show that the proposed method is promising and has real-time performance. Finally, we show the applicability of our method with an application wherein the tag-clouds specific to places, people, and keywords are constructed and used for digital media selection within a social network domain.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131977589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Spatio-temporal Gaussian Mixture Model for Background Modeling 背景建模的时空高斯混合模型
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.73
Y. Soh, Y. Hae, Intaek Kim
Background subtraction is widely employed in the detection of moving objects when background does not show much dynamic behavior. Many background models have been proposed by researchers. Most of them analyses only temporal behavior of pixels and ignores spatial relations of neighborhood that may be a key to better separation of foreground from background when background has dynamic activities. To remedy, some researchers proposed spatio-temporal approaches usually in the block-based framework. Two recent reviews[1, 2] showed that temporal kernel density estimation(KDE) method and temporal Gaussian mixture model(GMM) perform about equally best among possible temporal background models. Spatio-temporal version of KDE was proposed. However, for GMM, explicit extension to spatio-temporal domain is not easily seen in the literature. In this paper, we propose an extension of GMM from temporal domain to spatio-temporal domain. We applied the methods to well known test sequences and found that the proposed outperforms the temporal GMM.
背景减法被广泛应用于背景动态行为不明显的运动目标检测中。研究者们提出了许多背景模型。它们大多只分析了像素的时间行为,而忽略了邻域的空间关系,而邻域关系可能是背景有动态活动时更好地分离前景和背景的关键。为了弥补这一缺陷,一些研究者提出了基于块的时空分析方法。最近的两篇综述[1,2]表明,在可能的时间背景模型中,时间核密度估计(KDE)方法和时间高斯混合模型(GMM)的性能几乎相同。提出了KDE的时空版本。然而,对于GMM,在文献中并不容易看到对时空域的明确扩展。本文提出了一种将GMM从时间域扩展到时空域的方法。我们将该方法应用于已知的测试序列,发现所提出的方法优于时间GMM。
{"title":"Spatio-temporal Gaussian Mixture Model for Background Modeling","authors":"Y. Soh, Y. Hae, Intaek Kim","doi":"10.1109/ISM.2012.73","DOIUrl":"https://doi.org/10.1109/ISM.2012.73","url":null,"abstract":"Background subtraction is widely employed in the detection of moving objects when background does not show much dynamic behavior. Many background models have been proposed by researchers. Most of them analyses only temporal behavior of pixels and ignores spatial relations of neighborhood that may be a key to better separation of foreground from background when background has dynamic activities. To remedy, some researchers proposed spatio-temporal approaches usually in the block-based framework. Two recent reviews[1, 2] showed that temporal kernel density estimation(KDE) method and temporal Gaussian mixture model(GMM) perform about equally best among possible temporal background models. Spatio-temporal version of KDE was proposed. However, for GMM, explicit extension to spatio-temporal domain is not easily seen in the literature. In this paper, we propose an extension of GMM from temporal domain to spatio-temporal domain. We applied the methods to well known test sequences and found that the proposed outperforms the temporal GMM.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132073467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
DLH/CLLS: An Open, Extensible System Design for Prosuming Lecture Recordings and Integrating Multimedia Learning Ecosystems DLH/CLLS:一个开放的,可扩展的系统设计,用于生产讲座录音和集成多媒体学习生态系统
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.97
Kai Michael Höver, Gundolf von Bachhaus, M. Hartle, M. Mühlhäuser
The production of lecture recordings is becoming increasingly important for university education and highly appreciated by students. However, those lecture recordings and corresponding systems are only a subset of different kinds of learning materials and learning tools that exist in learning environments. This demands for learning system designs that are easily accessible, extensible, and open for the integration with other environments, data sources, and user (inter-)actions. The contributions of this paper is as follows: we suggest a system that supports educators in presenting, recording, and providing their lectures as well as a system design following Linked Data principles to facilitate integration and users to interact with both each other and learning materials.
课堂录音的制作在大学教育中变得越来越重要,受到学生们的高度重视。然而,这些讲座录音和相应的系统只是存在于学习环境中的各种学习材料和学习工具的一个子集。这就要求学习系统设计易于访问、可扩展,并且可以与其他环境、数据源和用户(交互)操作集成。本文的贡献如下:我们提出了一个系统,该系统支持教育工作者展示、记录和提供他们的讲座,以及一个遵循关联数据原则的系统设计,以促进集成和用户与彼此和学习材料的交互。
{"title":"DLH/CLLS: An Open, Extensible System Design for Prosuming Lecture Recordings and Integrating Multimedia Learning Ecosystems","authors":"Kai Michael Höver, Gundolf von Bachhaus, M. Hartle, M. Mühlhäuser","doi":"10.1109/ISM.2012.97","DOIUrl":"https://doi.org/10.1109/ISM.2012.97","url":null,"abstract":"The production of lecture recordings is becoming increasingly important for university education and highly appreciated by students. However, those lecture recordings and corresponding systems are only a subset of different kinds of learning materials and learning tools that exist in learning environments. This demands for learning system designs that are easily accessible, extensible, and open for the integration with other environments, data sources, and user (inter-)actions. The contributions of this paper is as follows: we suggest a system that supports educators in presenting, recording, and providing their lectures as well as a system design following Linked Data principles to facilitate integration and users to interact with both each other and learning materials.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127522687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Variational Bayesian Inference Framework for Multiview Depth Image Enhancement 多视角深度图像增强的变分贝叶斯推理框架
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.44
P. Rana, Jalil Taghia, M. Flierl
In this paper, a general model-based framework for multiview depth image enhancement is proposed. Depth imagery plays a pivotal role in emerging free-viewpoint television. This technology requires high quality virtual view synthesis to enable viewers to move freely in a dynamic real world scene. Depth imagery of different viewpoints is used to synthesize an arbitrary number of novel views. Usually, the depth imagery is estimated individually by stereo-matching algorithms and, hence, shows lack of inter-view consistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the inter-view consistency of multiview depth imagery by using a variational Bayesian inference framework. First, our approach classifies the color information in the multiview color imagery. Second, using the resulting color clusters, we classify the corresponding depth values in the multiview depth imagery. Each clustered depth image is subject to further sub clustering. Finally, the resulting mean of the sub-clusters is used to enhance the depth imagery at multiple viewpoints. Experiments show that our approach improves the quality of virtual views by up to 0.25 dB.
本文提出了一种通用的基于模型的多视点深度图像增强框架。深度图像在新兴的自由视点电视中起着举足轻重的作用。这项技术需要高质量的虚拟视图合成,使观众能够在动态的真实世界场景中自由移动。利用不同视点的深度图像来合成任意数量的新视点。通常使用立体匹配算法对深度图像进行单独估计,缺乏视点间的一致性。这种不一致会对视图合成的质量产生负面影响。本文利用变分贝叶斯推理框架增强了多视点深度图像的视间一致性。首先,我们的方法对多视图彩色图像中的颜色信息进行分类。其次,利用得到的颜色聚类对多视图深度图像中相应的深度值进行分类。每个聚类的深度图像都要进行进一步的子聚类。最后,利用子聚类的均值增强多视点深度图像。实验表明,我们的方法将虚拟视图的质量提高了0.25 dB。
{"title":"A Variational Bayesian Inference Framework for Multiview Depth Image Enhancement","authors":"P. Rana, Jalil Taghia, M. Flierl","doi":"10.1109/ISM.2012.44","DOIUrl":"https://doi.org/10.1109/ISM.2012.44","url":null,"abstract":"In this paper, a general model-based framework for multiview depth image enhancement is proposed. Depth imagery plays a pivotal role in emerging free-viewpoint television. This technology requires high quality virtual view synthesis to enable viewers to move freely in a dynamic real world scene. Depth imagery of different viewpoints is used to synthesize an arbitrary number of novel views. Usually, the depth imagery is estimated individually by stereo-matching algorithms and, hence, shows lack of inter-view consistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the inter-view consistency of multiview depth imagery by using a variational Bayesian inference framework. First, our approach classifies the color information in the multiview color imagery. Second, using the resulting color clusters, we classify the corresponding depth values in the multiview depth imagery. Each clustered depth image is subject to further sub clustering. Finally, the resulting mean of the sub-clusters is used to enhance the depth imagery at multiple viewpoints. Experiments show that our approach improves the quality of virtual views by up to 0.25 dB.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133131722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Video-Based Lane Detection Using a Fast Vanishing Point Estimation Method 基于视频的车道检测快速消失点估计方法
Pub Date : 2012-12-10 DOI: 10.1109/ISM.2012.70
Burak Benligiray, C. Topal, C. Akinlar
Lane detection algorithms constitute a basis for intelligent vehicle systems such as lane tracking and involuntary lane departure detection. In this paper, we propose a simple and video-based lane detection algorithm that uses a fast vanishing point estimation method. The first step of the algorithm is to extract and validate the line segments from the image with a recently proposed line detection algorithm. In the next step, an angle based elimination of line segments is done according to the perspective characteristics of lane markings. This basic operation removes many line segments that belong to irrelevant details on the scene and greatly reduces the number of features to be processed afterwards. Remaining line segments are extrapolated and superimposed to detect the image location where majority of the linear edge features converge. The location found by this efficient operation is assumed to be the vanishing point. Subsequently, an orientation-based removal is done by eliminating the line segments whose extensions do not intersect the vanishing point. The final step is clustering the remaining line segments such that each cluster represents a lane marking or a boundary of the road (i.e. sidewalks, barriers or shoulders). The properties of the line segments that constitute the clusters are fused to represent each cluster with a single line. The nearest two clusters to the vehicle are chosen as the lines that bound the lane that is being driven on. The proposed algorithm works in an average of 12 milliseconds for each frame with 640×480 resolution on a 2.20 GHz Intel CPU. This performance metric shows that the algorithm can be deployed on minimal hardware and still provide real-time performance.
车道检测算法是车道跟踪和非自愿车道偏离检测等智能车辆系统的基础。在本文中,我们提出了一种简单的基于视频的车道检测算法,该算法使用快速消失点估计方法。该算法的第一步是使用最近提出的线段检测算法从图像中提取和验证线段。下一步,根据车道标线的透视特征,进行基于角度的线段消除。这个基本的操作去除了很多属于场景中不相关细节的线段,大大减少了后续需要处理的特征数量。剩余的线段被外推和叠加,以检测大多数线性边缘特征收敛的图像位置。通过这种有效的操作找到的位置被假定为消失点。随后,通过消除其延伸不与消失点相交的线段来进行基于方向的去除。最后一步是对剩下的线段进行聚类,这样每个聚类代表一个车道标记或道路的边界(即人行道、障碍物或肩)。构成集群的线段的属性被融合到用一条线表示每个集群。选择离车辆最近的两个集群作为连接正在行驶的车道的线。该算法在2.20 GHz英特尔CPU上以640×480分辨率平均每帧工作12毫秒。这个性能指标表明,该算法可以部署在最小的硬件上,并且仍然提供实时性能。
{"title":"Video-Based Lane Detection Using a Fast Vanishing Point Estimation Method","authors":"Burak Benligiray, C. Topal, C. Akinlar","doi":"10.1109/ISM.2012.70","DOIUrl":"https://doi.org/10.1109/ISM.2012.70","url":null,"abstract":"Lane detection algorithms constitute a basis for intelligent vehicle systems such as lane tracking and involuntary lane departure detection. In this paper, we propose a simple and video-based lane detection algorithm that uses a fast vanishing point estimation method. The first step of the algorithm is to extract and validate the line segments from the image with a recently proposed line detection algorithm. In the next step, an angle based elimination of line segments is done according to the perspective characteristics of lane markings. This basic operation removes many line segments that belong to irrelevant details on the scene and greatly reduces the number of features to be processed afterwards. Remaining line segments are extrapolated and superimposed to detect the image location where majority of the linear edge features converge. The location found by this efficient operation is assumed to be the vanishing point. Subsequently, an orientation-based removal is done by eliminating the line segments whose extensions do not intersect the vanishing point. The final step is clustering the remaining line segments such that each cluster represents a lane marking or a boundary of the road (i.e. sidewalks, barriers or shoulders). The properties of the line segments that constitute the clusters are fused to represent each cluster with a single line. The nearest two clusters to the vehicle are chosen as the lines that bound the lane that is being driven on. The proposed algorithm works in an average of 12 milliseconds for each frame with 640×480 resolution on a 2.20 GHz Intel CPU. This performance metric shows that the algorithm can be deployed on minimal hardware and still provide real-time performance.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124620418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
期刊
2012 IEEE International Symposium on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1