首页 > 最新文献

2015 IEEE International Conference on Multimedia and Expo (ICME)最新文献

英文 中文
Multi-modal learning for gesture recognition 手势识别的多模态学习
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177460
Congqi Cao, Yifan Zhang, Hanqing Lu
With the development of sensing equipments, data from different modalities is available for gesture recognition. In this paper, we propose a novel multi-modal learning framework. A coupled hidden Markov model (CHMM) is employed to discover the correlation and complementary information across different modalities. In this framework, we use two configurations: one is multi-modal learning and multi-modal testing, where all the modalities used during learning are still available during testing; the other is multi-modal learning and single-modal testing, where only one modality is available during testing. Experiments on two real-world gesture recognition data sets have demonstrated the effectiveness of our multi-modal learning framework. Improvements on both of the multi-modal and single-modal testing have been observed.
随着传感设备的发展,不同模态的数据可用于手势识别。在本文中,我们提出了一个新的多模态学习框架。采用一种耦合的隐马尔可夫模型(CHMM)来发现不同模态之间的相关和互补信息。在这个框架中,我们使用了两种配置:一种是多模态学习和多模态测试,其中在学习过程中使用的所有模态在测试过程中仍然可用;另一种是多模态学习和单模态测试,在测试过程中只有一种模态可用。在两个真实世界的手势识别数据集上的实验证明了我们的多模态学习框架的有效性。已经观察到多模态和单模态试验的改进。
{"title":"Multi-modal learning for gesture recognition","authors":"Congqi Cao, Yifan Zhang, Hanqing Lu","doi":"10.1109/ICME.2015.7177460","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177460","url":null,"abstract":"With the development of sensing equipments, data from different modalities is available for gesture recognition. In this paper, we propose a novel multi-modal learning framework. A coupled hidden Markov model (CHMM) is employed to discover the correlation and complementary information across different modalities. In this framework, we use two configurations: one is multi-modal learning and multi-modal testing, where all the modalities used during learning are still available during testing; the other is multi-modal learning and single-modal testing, where only one modality is available during testing. Experiments on two real-world gesture recognition data sets have demonstrated the effectiveness of our multi-modal learning framework. Improvements on both of the multi-modal and single-modal testing have been observed.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124407098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
VTouch: Vision-enhanced interaction for large touch displays VTouch:用于大型触摸显示器的视觉增强交互
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177390
Yinpeng Chen, Zicheng Liu, P. Chou, Zhengyou Zhang
We propose a system that augments touch input with visual understanding of the user to improve interaction with a large touch-sensitive display. A commodity color plus depth sensor such as Microsoft Kinect adds the visual modality and enables new interactions beyond touch. Through visual analysis, the system understands where the user is, who the user is, and what the user is doing even before the user touches the display. Such information is used to enhance interaction in multiple ways. For example, a user can use simple gestures to bring up menu items such as color palette and soft keyboard; menu items can be shown where the user is and can follow the user; hovering can show information to the user before the user commits to touch; the user can perform different functions (for example writing and erasing) with different hands; and the user's preference profile can be maintained, distinct from other users. User studies are conducted and the users very much appreciate the value of these and other enhanced interactions.
我们提出了一个系统,通过用户的视觉理解来增强触摸输入,以改善与大型触敏显示器的交互。商品颜色加深度传感器(如微软Kinect)增加了视觉形式,并实现了触摸之外的新互动。通过视觉分析,系统甚至在用户触摸显示器之前就知道用户在哪里,用户是谁,以及用户在做什么。这些信息用于以多种方式增强交互。例如,用户可以使用简单的手势来调出菜单项,如调色板和软键盘;菜单项可以显示在用户所在的位置,并可以跟随用户;悬停可以在用户提交触摸之前向用户显示信息;用户可以用不同的手执行不同的功能(例如书写和擦除);并且可以维护用户的偏好配置文件,以区别于其他用户。进行了用户研究,用户非常欣赏这些和其他增强交互的价值。
{"title":"VTouch: Vision-enhanced interaction for large touch displays","authors":"Yinpeng Chen, Zicheng Liu, P. Chou, Zhengyou Zhang","doi":"10.1109/ICME.2015.7177390","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177390","url":null,"abstract":"We propose a system that augments touch input with visual understanding of the user to improve interaction with a large touch-sensitive display. A commodity color plus depth sensor such as Microsoft Kinect adds the visual modality and enables new interactions beyond touch. Through visual analysis, the system understands where the user is, who the user is, and what the user is doing even before the user touches the display. Such information is used to enhance interaction in multiple ways. For example, a user can use simple gestures to bring up menu items such as color palette and soft keyboard; menu items can be shown where the user is and can follow the user; hovering can show information to the user before the user commits to touch; the user can perform different functions (for example writing and erasing) with different hands; and the user's preference profile can be maintained, distinct from other users. User studies are conducted and the users very much appreciate the value of these and other enhanced interactions.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distributed cooperative video coding for wireless video broadcast system 面向无线视频广播系统的分布式协同视频编码
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177521
Mengyao Sun, Yumei Wang, Hao Yu, Yu Liu
In wireless video broadcast system, analog joint source-channel coding (JSCC) has shown advantage compared to conventional separate digital source/channel coding in the aspect that it can avoid cliff effect gracefully. What's more, analog JSCC only needs a little calculations at the encoder and has strong adaptability to different channel condition, which is very suitable to the wireless cooperative scenario. Thus in this paper, we propose a distributed cooperative video coding (DCVC) scheme for wireless video broadcast system. The scheme is based on the transmission structure of Softcast and borrows the basic idea of distributed video coding. Different from the former cooperative video delivery methods, DCVC utilizes analog coding and coset coding to avoid cliff effect and to make the best of transmission power. The experimental results show that DCVC outperforms the conventional WSVC and H.264/SVC cooperative schemes, especially when the cooperative channel is worse than the original source-terminal channel.
在无线视频广播系统中,模拟源信道联合编码(JSCC)可以很好地避免悬崖效应,显示出与传统的数字源信道分离编码相比的优势。此外,模拟JSCC在编码器处只需要少量的计算,对不同信道条件的适应性强,非常适合无线协作场景。为此,本文提出了一种适用于无线视频广播系统的分布式协同视频编码(DCVC)方案。该方案以Softcast的传输结构为基础,借鉴了分布式视频编码的基本思想。与以往的协同视频传输方式不同,DCVC采用模拟编码和coset编码,避免了悬崖效应,最大限度地利用了传输功率。实验结果表明,DCVC方案优于传统的WSVC方案和H.264/SVC合作方案,特别是在合作信道比原源端信道差的情况下。
{"title":"Distributed cooperative video coding for wireless video broadcast system","authors":"Mengyao Sun, Yumei Wang, Hao Yu, Yu Liu","doi":"10.1109/ICME.2015.7177521","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177521","url":null,"abstract":"In wireless video broadcast system, analog joint source-channel coding (JSCC) has shown advantage compared to conventional separate digital source/channel coding in the aspect that it can avoid cliff effect gracefully. What's more, analog JSCC only needs a little calculations at the encoder and has strong adaptability to different channel condition, which is very suitable to the wireless cooperative scenario. Thus in this paper, we propose a distributed cooperative video coding (DCVC) scheme for wireless video broadcast system. The scheme is based on the transmission structure of Softcast and borrows the basic idea of distributed video coding. Different from the former cooperative video delivery methods, DCVC utilizes analog coding and coset coding to avoid cliff effect and to make the best of transmission power. The experimental results show that DCVC outperforms the conventional WSVC and H.264/SVC cooperative schemes, especially when the cooperative channel is worse than the original source-terminal channel.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126194355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Flickr circles: Mining socially-aware aesthetic tendency Flickr圈子:挖掘具有社会意识的审美倾向
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177384
Luming Zhang, Roger Zimmermann
Aesthetic tendency discovery is a useful and interesting application in social media. This paper proposes to categorize large-scale Flickr users into multiple circles. Each circle contains users with similar aesthetic interests (e.g., landscapes or abstract paintings). We notice that: (1) an aesthetic model should be flexible as different visual features may be used to describe different image sets, and (2) the numbers of photos from different users varies significantly and some users have very few photos. Therefore, a regularized topic model is proposed to quantify user's aesthetic interest as a distribution in the latent space. Then, a graph is built to describe the similarity of aesthetic interests among users. Obviously, densely connected users are with similar aesthetic interests. Thus an efficient dense subgraph mining algorithm is adopted to group users into different circles. Experiments show that our approach accurately detects circles on an image set crawled from over 60,000 Flickr users.
审美倾向发现在社交媒体中是一个有用而有趣的应用。本文提出将大规模Flickr用户划分为多个圈子。每个圈子包含有相似审美兴趣的用户(例如,风景或抽象绘画)。我们注意到:(1)审美模型应该是灵活的,因为不同的视觉特征可以用来描述不同的图像集;(2)来自不同用户的照片数量差异很大,有些用户的照片很少。因此,提出了一种正则化的主题模型,将用户的审美兴趣量化为潜在空间中的分布。然后,构建一个图来描述用户之间审美兴趣的相似度。显然,连接密集的用户具有相似的审美兴趣。因此,采用了一种高效的密集子图挖掘算法,将用户分组到不同的圈子中。实验表明,我们的方法可以准确地检测到从60,000多个Flickr用户中抓取的图像集上的圆圈。
{"title":"Flickr circles: Mining socially-aware aesthetic tendency","authors":"Luming Zhang, Roger Zimmermann","doi":"10.1109/ICME.2015.7177384","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177384","url":null,"abstract":"Aesthetic tendency discovery is a useful and interesting application in social media. This paper proposes to categorize large-scale Flickr users into multiple circles. Each circle contains users with similar aesthetic interests (e.g., landscapes or abstract paintings). We notice that: (1) an aesthetic model should be flexible as different visual features may be used to describe different image sets, and (2) the numbers of photos from different users varies significantly and some users have very few photos. Therefore, a regularized topic model is proposed to quantify user's aesthetic interest as a distribution in the latent space. Then, a graph is built to describe the similarity of aesthetic interests among users. Obviously, densely connected users are with similar aesthetic interests. Thus an efficient dense subgraph mining algorithm is adopted to group users into different circles. Experiments show that our approach accurately detects circles on an image set crawled from over 60,000 Flickr users.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128686483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Structure-preserving Image Quality Assessment 保持结构的图像质量评估
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177436
Yilin Wang, Qiang Zhang, Baoxin Li
Perceptual Image Quality Assessment (IQA) has many applications. Existing IQA approaches typically work only for one of three scenarios: full-reference, non-reference, or reduced-reference. Techniques that attempt to incorporate image structure information often rely on hand-crafted features, making them difficult to be extended to handle different scenarios. On the other hand, objective metrics like Mean Square Error (MSE), while being easy to compute, are often deemed ineffective for measuring perceptual quality. This paper presents a novel approach to perceptual quality assessment by developing an MSE-like metric, which enjoys the benefit of MSE in terms of inexpensive computation and universal applicability while allowing structural information of an image being taken into consideration. The latter was achieved through introducing structure-preserving kernelization into a MSE-like formulation. We show that the method can lead to competitive FR-IQA results. Further, by developing a feature coding scheme based on this formulation, we extend the model to improve the performance of NR-IQA methods. We report extensive experiments illustrating the results from both our FR-IQA and NR-IQA algorithms with comparison to existing state-of-the-art methods.
感知图像质量评估(IQA)有许多应用。现有的IQA方法通常只适用于以下三种场景中的一种:完全引用、非引用或简化引用。试图整合图像结构信息的技术通常依赖于手工制作的特征,这使得它们难以扩展以处理不同的场景。另一方面,均方误差(MSE)等客观指标虽然易于计算,但通常被认为对衡量感知质量无效。本文提出了一种新的感知质量评估方法,即开发一种类似MSE的度量,该度量在考虑图像的结构信息的同时,具有MSE在计算成本低和普遍适用性方面的优点。后者是通过将结构保持核化引入mse类公式来实现的。我们表明,该方法可以导致具有竞争力的FR-IQA结果。此外,通过开发基于该公式的特征编码方案,我们扩展了该模型,以提高NR-IQA方法的性能。我们报告了大量的实验,说明了我们的FR-IQA和NR-IQA算法的结果,并与现有的最先进的方法进行了比较。
{"title":"Structure-preserving Image Quality Assessment","authors":"Yilin Wang, Qiang Zhang, Baoxin Li","doi":"10.1109/ICME.2015.7177436","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177436","url":null,"abstract":"Perceptual Image Quality Assessment (IQA) has many applications. Existing IQA approaches typically work only for one of three scenarios: full-reference, non-reference, or reduced-reference. Techniques that attempt to incorporate image structure information often rely on hand-crafted features, making them difficult to be extended to handle different scenarios. On the other hand, objective metrics like Mean Square Error (MSE), while being easy to compute, are often deemed ineffective for measuring perceptual quality. This paper presents a novel approach to perceptual quality assessment by developing an MSE-like metric, which enjoys the benefit of MSE in terms of inexpensive computation and universal applicability while allowing structural information of an image being taken into consideration. The latter was achieved through introducing structure-preserving kernelization into a MSE-like formulation. We show that the method can lead to competitive FR-IQA results. Further, by developing a feature coding scheme based on this formulation, we extend the model to improve the performance of NR-IQA methods. We report extensive experiments illustrating the results from both our FR-IQA and NR-IQA algorithms with comparison to existing state-of-the-art methods.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134368859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A framework of extracting multi-scale features using multiple convolutional neural networks 基于多卷积神经网络的多尺度特征提取框架
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177449
Kuan-Chuan Peng, Tsuhan Chen
Most works related to convolutional neural networks (CNN) use the traditional CNN framework which extracts features in only one scale. We propose multi-scale convolutional neural networks (MSCNN) which can not only extract multi-scale features but also solve the issues of the previous methods which use CNN to extract multi-scale features. With the assumption of label-inheritable (LI) property, we also propose a method to generate exponentially more training examples for MSCNN from the given training set. Our experimental results show that MSCNN outperforms both the state-of-the-art methods and the traditional CNN framework on artist, artistic style, and architectural style classification, supporting that MSCNN outperforms the traditional CNN framework on the tasks which at least partially satisfy LI property.
与卷积神经网络(CNN)相关的大部分工作都使用传统的CNN框架,仅在一个尺度上提取特征。本文提出的多尺度卷积神经网络(MSCNN)不仅可以提取多尺度特征,而且解决了以往使用CNN提取多尺度特征方法的问题。在假设标签可继承(LI)属性的前提下,提出了一种从给定的训练集生成指数级多的MSCNN训练样例的方法。我们的实验结果表明,MSCNN在艺术家、艺术风格和建筑风格分类上优于最先进的方法和传统的CNN框架,支持MSCNN在至少部分满足LI属性的任务上优于传统的CNN框架。
{"title":"A framework of extracting multi-scale features using multiple convolutional neural networks","authors":"Kuan-Chuan Peng, Tsuhan Chen","doi":"10.1109/ICME.2015.7177449","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177449","url":null,"abstract":"Most works related to convolutional neural networks (CNN) use the traditional CNN framework which extracts features in only one scale. We propose multi-scale convolutional neural networks (MSCNN) which can not only extract multi-scale features but also solve the issues of the previous methods which use CNN to extract multi-scale features. With the assumption of label-inheritable (LI) property, we also propose a method to generate exponentially more training examples for MSCNN from the given training set. Our experimental results show that MSCNN outperforms both the state-of-the-art methods and the traditional CNN framework on artist, artistic style, and architectural style classification, supporting that MSCNN outperforms the traditional CNN framework on the tasks which at least partially satisfy LI property.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129390990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Exploring feature space with semantic attributes 用语义属性探索特征空间
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177441
Junjie Cai, Richang Hong, Meng Wang, Q. Tian
Indexing is a critical step for searching digital images in a large database. To date, how to design discriminative and compact indexing strategy still remains a challenging issue, partly due to the well-known semantic gap between user queries and rich semantics in the large scale dataset. In this paper, we propose to construct a novel joint semantic-visual space by leveraging visual descriptors and semantic attributes, which aims to narrow down the semantic gap by taking both attribute and indexing into one framework. Such a joint space embraces the flexibility of conducting Coherent Semantic-visual Indexing, which employs binary codes to boost the retrieval speed with satisfying accuracy. To solve the proposed model effectively, three contributions are made in this submission. First, we propose an interactive optimization method to find the joint space of semantic and visual descriptors. Second, we prove the convergence property of our optimization algorithm, which guarantees our system will find a good solution in certain rounds. At last, we integrate the semantic-visual joint space system with spectral hashing, which can find an efficient solution to search up to million scale datasets. Experiments on two standard retrieval datasets i.e., Holidays1M and Oxford5K, show that the proposed method presents promising performance compared with the state-of-the-arts.
索引是在大型数据库中检索数字图像的关键步骤。迄今为止,如何设计判别和紧凑的索引策略仍然是一个具有挑战性的问题,部分原因是在大规模数据集中,用户查询和丰富的语义之间存在众所周知的语义差距。本文提出利用视觉描述符和语义属性构建一种新的联合语义-视觉空间,将属性和索引融合到一个框架中,以缩小语义差距。这种联合空间具有进行连贯语义-视觉标引的灵活性,采用二进制码提高检索速度并满足精度要求。为了有效地解决所提出的模型,本文在三个方面做出了贡献。首先,我们提出了一种交互式优化方法来寻找语义和视觉描述符的联合空间。其次,我们证明了优化算法的收敛性,保证了系统在一定的回合内会找到一个好的解。最后,我们将语义-视觉联合空间系统与光谱哈希相结合,找到了一种高效的解决方案,可以搜索多达百万规模的数据集。在Holidays1M和Oxford5K两个标准检索数据集上进行的实验表明,与目前的方法相比,该方法具有良好的性能。
{"title":"Exploring feature space with semantic attributes","authors":"Junjie Cai, Richang Hong, Meng Wang, Q. Tian","doi":"10.1109/ICME.2015.7177441","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177441","url":null,"abstract":"Indexing is a critical step for searching digital images in a large database. To date, how to design discriminative and compact indexing strategy still remains a challenging issue, partly due to the well-known semantic gap between user queries and rich semantics in the large scale dataset. In this paper, we propose to construct a novel joint semantic-visual space by leveraging visual descriptors and semantic attributes, which aims to narrow down the semantic gap by taking both attribute and indexing into one framework. Such a joint space embraces the flexibility of conducting Coherent Semantic-visual Indexing, which employs binary codes to boost the retrieval speed with satisfying accuracy. To solve the proposed model effectively, three contributions are made in this submission. First, we propose an interactive optimization method to find the joint space of semantic and visual descriptors. Second, we prove the convergence property of our optimization algorithm, which guarantees our system will find a good solution in certain rounds. At last, we integrate the semantic-visual joint space system with spectral hashing, which can find an efficient solution to search up to million scale datasets. Experiments on two standard retrieval datasets i.e., Holidays1M and Oxford5K, show that the proposed method presents promising performance compared with the state-of-the-arts.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130793893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Single image super-resolution via 2D sparse representation 基于二维稀疏表示的单幅图像超分辨率
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177485
Na Qi, Yunhui Shi, Xiaoyan Sun, Wenpeng Ding, Baocai Yin
Image super-resolution with sparsity prior provides promising performance. However, traditional sparse-based super resolution methods transform a two dimensional (2D) image into a one dimensional (1D) vector, which ignores the intrinsic 2D structure as well as spatial correlation inherent in images. In this paper, we propose the first image super-resolution method which reconstructs a high resolution image from its low resolution counterpart via a two dimensional sparse model. Correspondingly, we present a new dictionary learning algorithm to fully make use of the corresponding relationship of two pairs of 2D dictionaries of low and high resolution images, respectively. Experimental results demonstrate that our proposed image super-resolution with 2D sparse model outperforms state-of-the-art 1D sparse model based super resolution methods in terms of both reconstruction ability and memory usage.
具有稀疏先验的图像超分辨率提供了良好的性能。然而,传统的基于稀疏的超分辨率方法将二维(2D)图像转换为一维(1D)向量,忽略了图像固有的二维结构和空间相关性。本文提出了第一种图像超分辨率方法,通过二维稀疏模型从低分辨率图像重建高分辨率图像。相应地,我们提出了一种新的字典学习算法,充分利用了低分辨率和高分辨率图像的两对二维字典的对应关系。实验结果表明,我们提出的基于二维稀疏模型的图像超分辨率在重建能力和内存利用率方面都优于当前基于一维稀疏模型的图像超分辨率方法。
{"title":"Single image super-resolution via 2D sparse representation","authors":"Na Qi, Yunhui Shi, Xiaoyan Sun, Wenpeng Ding, Baocai Yin","doi":"10.1109/ICME.2015.7177485","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177485","url":null,"abstract":"Image super-resolution with sparsity prior provides promising performance. However, traditional sparse-based super resolution methods transform a two dimensional (2D) image into a one dimensional (1D) vector, which ignores the intrinsic 2D structure as well as spatial correlation inherent in images. In this paper, we propose the first image super-resolution method which reconstructs a high resolution image from its low resolution counterpart via a two dimensional sparse model. Correspondingly, we present a new dictionary learning algorithm to fully make use of the corresponding relationship of two pairs of 2D dictionaries of low and high resolution images, respectively. Experimental results demonstrate that our proposed image super-resolution with 2D sparse model outperforms state-of-the-art 1D sparse model based super resolution methods in terms of both reconstruction ability and memory usage.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1983 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120847185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Optimization of the number of rays in interpolation for light field based free viewpoint systems 基于光场的自由视点系统插值中光线数的优化
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177463
H. Shidanshidi, F. Safaei, W. Li
Light field (LF) rendering is widely used in free viewpoint video systems (FVV). Different methods have been proposed to employ depth maps to improve the rendering quality. However, estimation of depth is often error-prone. In this paper, a new method based on the concept of effective sampling density (ESD) is proposed for evaluating the depth-based LF rendering algorithms at different levels of errors in the depth estimation. In addition, for a given rendering quality, we provide an estimation of number of rays required in the interpolation algorithm to compensate for the adverse effect caused by errors in depth maps. The proposed method is particularly useful in designing a rendering algorithm with inaccurate knowledge of depth to achieve the required rendering quality. Both the theoretical study and numerical simulations have verified the efficacy of the proposed method.
光场渲染在自由视点视频系统(FVV)中应用广泛。人们提出了不同的方法来利用深度图来提高渲染质量。然而,深度估计往往容易出错。本文基于有效采样密度(ESD)的概念,提出了一种新的方法来评估不同深度估计误差水平下基于深度的LF绘制算法。此外,对于给定的渲染质量,我们提供了插值算法中所需光线数量的估计,以补偿深度图中错误造成的不利影响。该方法特别适用于设计深度知识不准确的渲染算法,以达到所需的渲染质量。理论研究和数值模拟都验证了该方法的有效性。
{"title":"Optimization of the number of rays in interpolation for light field based free viewpoint systems","authors":"H. Shidanshidi, F. Safaei, W. Li","doi":"10.1109/ICME.2015.7177463","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177463","url":null,"abstract":"Light field (LF) rendering is widely used in free viewpoint video systems (FVV). Different methods have been proposed to employ depth maps to improve the rendering quality. However, estimation of depth is often error-prone. In this paper, a new method based on the concept of effective sampling density (ESD) is proposed for evaluating the depth-based LF rendering algorithms at different levels of errors in the depth estimation. In addition, for a given rendering quality, we provide an estimation of number of rays required in the interpolation algorithm to compensate for the adverse effect caused by errors in depth maps. The proposed method is particularly useful in designing a rendering algorithm with inaccurate knowledge of depth to achieve the required rendering quality. Both the theoretical study and numerical simulations have verified the efficacy of the proposed method.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128631951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A probabilistic model for food image recognition in restaurants 饭店食物图像识别的概率模型
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177464
Luis Herranz, Ruihan Xu, Shuqiang Jiang
A large amount of food photos are taken in restaurants for diverse reasons. This dish recognition problem is very challenging, due to different cuisines, cooking styles and the intrinsic difficulty of modeling food from its visual appearance. Contextual knowledge is crucial to improve recognition in such scenario. In particular, geocontext has been widely exploited for outdoor landmark recognition. Similarly, we exploit knowledge about menus and geolocation of restaurants and test images. We first adapt a framework based on discarding unlikely categories located far from the test image. Then we reformulate the problem using a probabilistic model connecting dishes, restaurants and geolocations. We apply that model in three different tasks: dish recognition, restaurant recognition and geolocation refinement. Experiments on a dataset including 187 restaurants and 701 dishes show that combining multiple evidences (visual, geolocation, and external knowledge) can boost the performance in all tasks.
由于各种原因,在餐馆里拍摄了大量的食物照片。由于不同的菜系、烹饪风格以及从食物的视觉外观建模的内在困难,这个菜肴识别问题非常具有挑战性。上下文知识对于提高这种情况下的识别能力至关重要。特别是,地理文脉已被广泛用于室外地标识别。类似地,我们利用关于菜单和餐馆地理位置的知识和测试图像。我们首先采用一个基于丢弃远离测试图像的不可能类别的框架。然后,我们使用一个连接菜肴、餐馆和地理位置的概率模型来重新表述这个问题。我们将该模型应用于三个不同的任务:菜肴识别、餐厅识别和地理位置优化。在包含187家餐厅和701道菜的数据集上进行的实验表明,结合多种证据(视觉、地理位置和外部知识)可以提高所有任务的性能。
{"title":"A probabilistic model for food image recognition in restaurants","authors":"Luis Herranz, Ruihan Xu, Shuqiang Jiang","doi":"10.1109/ICME.2015.7177464","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177464","url":null,"abstract":"A large amount of food photos are taken in restaurants for diverse reasons. This dish recognition problem is very challenging, due to different cuisines, cooking styles and the intrinsic difficulty of modeling food from its visual appearance. Contextual knowledge is crucial to improve recognition in such scenario. In particular, geocontext has been widely exploited for outdoor landmark recognition. Similarly, we exploit knowledge about menus and geolocation of restaurants and test images. We first adapt a framework based on discarding unlikely categories located far from the test image. Then we reformulate the problem using a probabilistic model connecting dishes, restaurants and geolocations. We apply that model in three different tasks: dish recognition, restaurant recognition and geolocation refinement. Experiments on a dataset including 187 restaurants and 701 dishes show that combining multiple evidences (visual, geolocation, and external knowledge) can boost the performance in all tasks.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129315154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
2015 IEEE International Conference on Multimedia and Expo (ICME)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1