首页 > 最新文献

2015 IEEE International Conference on Multimedia and Expo (ICME)最新文献

英文 中文
Edge-preserving image smoothing with local constraints on gradient and intensity 基于梯度和强度局部约束的图像边缘保持平滑
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177403
Pan Shao, Shouhong Ding, Lizhuang Ma
We present a new edge-preserving image smoothing approach by incorporating local features into a holistic optimization framework. Our method embodies a gradient constraint to enforce detail eliminating and an intensity constraint to achieve shape maintaining. The gradients of high-contrast details are suppressed to a lower magnitude, subsequent to which structural edges can be located. The intensities of a small region are regulated to resemble the initial fabric, which facilitates further detail capture. Experimental results indicate that the proposed algorithm, availed by a sparse gradient counting mechanism, can properly smooth non-edge regions even when textures and structures are similar in scale. The effectiveness of our approach is demonstrated in the context of detail manipulation, edge detection, and image abstraction.
我们提出了一种新的边缘保持图像平滑方法,将局部特征融合到整体优化框架中。该方法采用梯度约束实现细节消除,强度约束实现形状保持。高对比度细节的梯度被抑制到较低的幅度,随后可以定位结构边缘。一个小区域的强度被调节为类似于初始织物,这有利于进一步的细节捕获。实验结果表明,该算法利用稀疏梯度计数机制,即使纹理和结构在尺度上相似,也能很好地平滑非边缘区域。该方法的有效性在细节处理、边缘检测和图像抽象的背景下得到了证明。
{"title":"Edge-preserving image smoothing with local constraints on gradient and intensity","authors":"Pan Shao, Shouhong Ding, Lizhuang Ma","doi":"10.1109/ICME.2015.7177403","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177403","url":null,"abstract":"We present a new edge-preserving image smoothing approach by incorporating local features into a holistic optimization framework. Our method embodies a gradient constraint to enforce detail eliminating and an intensity constraint to achieve shape maintaining. The gradients of high-contrast details are suppressed to a lower magnitude, subsequent to which structural edges can be located. The intensities of a small region are regulated to resemble the initial fabric, which facilitates further detail capture. Experimental results indicate that the proposed algorithm, availed by a sparse gradient counting mechanism, can properly smooth non-edge regions even when textures and structures are similar in scale. The effectiveness of our approach is demonstrated in the context of detail manipulation, edge detection, and image abstraction.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126304184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data-oriented multi-index hashing 面向数据的多索引哈希
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177420
Qingyun Liu, Hongtao Xie, Yizhi Liu, Chuang Zhang, Li Guo
Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it divides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method. We first compute the covariance matrix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned binary codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of our method can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%.
多索引哈希(MIH)是索引二进制代码的最先进方法,因为它将长代码划分为子字符串并构建多个哈希表。然而,MIH是基于数据集代码均匀分布的假设,在处理非均匀分布的代码时会失去效率。此外,有许多结果与查询共享相同的汉明距离,这使得距离测量具有歧义性。本文提出了一种面向数据的多索引哈希方法。我们首先计算位的协方差矩阵,并学习每个二进制子串的自适应投影向量。我们没有使用子字符串作为哈希表的直接索引,而是将它们与相应的投影向量进行投影以生成新的索引。使用自适应投影,每个哈希表中的索引几乎均匀分布。然后利用协方差矩阵,提出了一种二进制码的排序方法。通过为不同的位分配不同的位级权重,返回的二进制代码按照更细粒度的二进制代码级别进行排序。在参考大型数据集上进行的实验表明,与MIH相比,本文方法的时间性能提高36.9% ~ 87.4%,搜索精度提高22.2%。
{"title":"Data-oriented multi-index hashing","authors":"Qingyun Liu, Hongtao Xie, Yizhi Liu, Chuang Zhang, Li Guo","doi":"10.1109/ICME.2015.7177420","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177420","url":null,"abstract":"Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it divides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method. We first compute the covariance matrix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned binary codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of our method can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"453 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123022594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Group sensitive Classifier Chains for multi-label classification 多标签分类的组敏感分类器链
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177400
Jun Huang, Guorong Li, Shuhui Wang, W. Zhang, Qingming Huang
In multi-label classification, labels often have correlations with each other. Exploiting label correlations can improve the performances of classifiers. Current multi-label classification methods mainly consider the global label correlations. However, the label correlations may be different over different data groups. In this paper, we propose a simple and efficient framework for multi-label classification, called Group sensitive Classifier Chains. We assume that similar examples not only share the same label correlations, but also tend to have similar labels. We augment the original feature space with label space and cluster them into groups, then learn the label dependency graph in each group respectively and build the classifier chains on each group specific label dependency graph. The group specific classifier chains which are built on the nearest group of the test example are used for prediction. Comparison results with the state-of-the-art approaches manifest competitive performances of our method.
在多标签分类中,标签之间往往存在相关性。利用标签相关性可以提高分类器的性能。目前的多标签分类方法主要考虑全局标签相关性。但是,在不同的数据组上,标签相关性可能是不同的。本文提出了一种简单有效的多标签分类框架,称为组敏感分类器链。我们假设相似的示例不仅具有相同的标签相关性,而且往往具有相似的标签。我们将原始特征空间扩充为标签空间并聚类成组,然后分别学习每组中的标签依赖图,并在每组特定的标签依赖图上构建分类器链。组特定的分类器链是建立在测试样本的最近组上的,用于预测。与最先进的方法的比较结果表明我们的方法具有竞争力。
{"title":"Group sensitive Classifier Chains for multi-label classification","authors":"Jun Huang, Guorong Li, Shuhui Wang, W. Zhang, Qingming Huang","doi":"10.1109/ICME.2015.7177400","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177400","url":null,"abstract":"In multi-label classification, labels often have correlations with each other. Exploiting label correlations can improve the performances of classifiers. Current multi-label classification methods mainly consider the global label correlations. However, the label correlations may be different over different data groups. In this paper, we propose a simple and efficient framework for multi-label classification, called Group sensitive Classifier Chains. We assume that similar examples not only share the same label correlations, but also tend to have similar labels. We augment the original feature space with label space and cluster them into groups, then learn the label dependency graph in each group respectively and build the classifier chains on each group specific label dependency graph. The group specific classifier chains which are built on the nearest group of the test example are used for prediction. Comparison results with the state-of-the-art approaches manifest competitive performances of our method.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130871523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Evaluating the efficacy of RGB-D cameras for surveillance 评估RGB-D摄像机的监控效果
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177415
S. Raghuraman, K. Bahirat, B. Prabhakaran
RGB-D cameras have enabled real-time 3D video processing for numerous computer vision applications, especially for surveillance type applications. In this paper, we first present a real-time anti-forensic 3D object stream manipulation framework to capture and manipulate live RBG-D data streams to create realistic images/videos showing individuals performing activities they did not actually do. The framework uses computer vision and graphics methods to render photorealistic animations of live mesh models captured using the camera. Next, we conducted a visual inspection of the manipulated RGB-D streams (just like security personnel would do) by users who are computer vision and graphics scientists. The study shows that it was significantly difficult to distinguish between the real or reconstructed rendering of such 3D video sequences, thus clearly showing the potential security risk involved. Finally, we investigate the efficacy of forensic approaches for detecting such manipulations.
RGB-D摄像机为许多计算机视觉应用,特别是监控类应用,实现了实时3D视频处理。在本文中,我们首先提出了一个实时反取证3D对象流操作框架,用于捕获和操作实时RBG-D数据流,以创建逼真的图像/视频,显示个人执行他们实际上没有做的活动。该框架使用计算机视觉和图形方法来渲染使用相机捕获的实时网格模型的逼真动画。接下来,我们对被操纵的RGB-D流进行了视觉检查(就像安全人员所做的那样),由计算机视觉和图形科学家进行。研究表明,这种3D视频序列的真实渲染和重建渲染之间的区分非常困难,从而清楚地显示了所涉及的潜在安全风险。最后,我们研究了检测此类操纵的法医方法的有效性。
{"title":"Evaluating the efficacy of RGB-D cameras for surveillance","authors":"S. Raghuraman, K. Bahirat, B. Prabhakaran","doi":"10.1109/ICME.2015.7177415","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177415","url":null,"abstract":"RGB-D cameras have enabled real-time 3D video processing for numerous computer vision applications, especially for surveillance type applications. In this paper, we first present a real-time anti-forensic 3D object stream manipulation framework to capture and manipulate live RBG-D data streams to create realistic images/videos showing individuals performing activities they did not actually do. The framework uses computer vision and graphics methods to render photorealistic animations of live mesh models captured using the camera. Next, we conducted a visual inspection of the manipulated RGB-D streams (just like security personnel would do) by users who are computer vision and graphics scientists. The study shows that it was significantly difficult to distinguish between the real or reconstructed rendering of such 3D video sequences, thus clearly showing the potential security risk involved. Finally, we investigate the efficacy of forensic approaches for detecting such manipulations.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130063567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Human interaction recognition in the wild: Analyzing trajectory clustering from multiple-instance-learning perspective 野外人类交互识别:从多实例学习的角度分析轨迹聚类
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177480
Bo Zhang, Paolo Rota, N. Conci, F. D. Natale
In this paper, we propose a framework to recognize complex human interactions. First, we adopt trajectories to represent human motion in a video. Then, the extracted trajectories are clustered into different groups (named as local motion patterns) using the coherent filtering algorithm. As trajectories within the same group exhibit similar motion properties (i.e., velocity, direction), we adopt the histogram of large-displacement optical flow (denoted as HO-LDOF) as the group motion feature vector. Thus, each video can be briefly represented by a collection of local motion patterns that are described by the HO-LDOF. Finally, classification is achieved using the citation-KNN, which is a typical multiple-instance-learning algorithm. Experimental results on the TV human interaction dataset and the UT human interaction dataset demonstrate the applicability of our method.
在本文中,我们提出了一个框架来识别复杂的人类互动。首先,我们采用轨迹来表示视频中的人体运动。然后,使用相干滤波算法将提取的轨迹聚类成不同的组(称为局部运动模式)。由于同一组内的轨迹具有相似的运动特性(即速度、方向),因此我们采用大位移光流直方图(HO-LDOF)作为组运动特征向量。因此,每个视频可以简单地表示为由HO-LDOF描述的局部运动模式的集合。最后,利用一种典型的多实例学习算法——引用- knn算法实现分类。在TV人机交互数据集和UT人机交互数据集上的实验结果表明了该方法的适用性。
{"title":"Human interaction recognition in the wild: Analyzing trajectory clustering from multiple-instance-learning perspective","authors":"Bo Zhang, Paolo Rota, N. Conci, F. D. Natale","doi":"10.1109/ICME.2015.7177480","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177480","url":null,"abstract":"In this paper, we propose a framework to recognize complex human interactions. First, we adopt trajectories to represent human motion in a video. Then, the extracted trajectories are clustered into different groups (named as local motion patterns) using the coherent filtering algorithm. As trajectories within the same group exhibit similar motion properties (i.e., velocity, direction), we adopt the histogram of large-displacement optical flow (denoted as HO-LDOF) as the group motion feature vector. Thus, each video can be briefly represented by a collection of local motion patterns that are described by the HO-LDOF. Finally, classification is achieved using the citation-KNN, which is a typical multiple-instance-learning algorithm. Experimental results on the TV human interaction dataset and the UT human interaction dataset demonstrate the applicability of our method.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128674119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Sparse nonlinear representation for voice conversion 语音转换的稀疏非线性表示
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177437
Toru Nakashika, T. Takiguchi, Y. Ariki
In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.
在语音转换中,基于稀疏表示的方法最近受到了人们的关注,因为相对而言,它们不受过度拟合或过度平滑问题的影响。在这些方法中,语音转换是通过估计一个稀疏向量来实现的,该向量根据输入向量和源说话人的字典的匹配来计算,确定应该使用目标说话人的哪些字典。基于稀疏表示的语音转换方法大致可以分为两种方法:1)使用训练数据中的原始声学特征作为并行字典的方法;2)从训练数据中训练并行字典的方法。在我们的方法中,我们遵循后一种方法,并使用具有稀疏约束的联合密度受限玻尔兹曼机系统地估计并行字典。通过语音转换实验,我们将该方法与传统的基于高斯混合模型(GMM)的方法和基于稀疏表示的非负矩阵分解(NMF)方法进行了比较,证实了该方法的高性能。
{"title":"Sparse nonlinear representation for voice conversion","authors":"Toru Nakashika, T. Takiguchi, Y. Ariki","doi":"10.1109/ICME.2015.7177437","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177437","url":null,"abstract":"In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127795540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
AutoRhythm: A music game with automatic hit-time generation and percussion identification 自动节奏:一个音乐游戏,具有自动命中时间生成和打击识别
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177487
P. Chen, Tzu-Chun Yeh, J. Jang, Wenshan Liou
This paper describes a music rhythm game called AutoRhythm, which can automatically generate the hit time for a rhythm game from a given piece of music, and identify user-defined percussions in real time when a user is playing the game. More specifically, AutoRhythm can automatically generate the hit time of the given music, either locally or via server-based computation, such that users can use the user-supplied music for the game directly. Moreover, to make the rhythm game more realistic, AutoRhythm allows users to interact with the game via any objects that can produce percussion sound, such as a pen or a chopstick hitting on the table. AutoRhythm can identify the percussions in real time while the music is playing. The identification is based on the power spectrum of each frame of the recording which combines percussions and playback music. Based on a test dataset of 12 recordings (with 2455 percussions of 4 types), our experiment indicates an F-measure of 96.79%, which is satisfactory for the purpose of the game. The flexibility of being able to use any user-supplied music for the game and to identify user-defined percussions from any objects available at hand makes the game innovative and unique of its kind.
本文描述了一款名为AutoRhythm的音乐节奏游戏,它可以根据给定的音乐片段自动生成节奏游戏的击打时间,并在用户玩游戏时实时识别用户自定义的打击。更具体地说,AutoRhythm可以自动生成给定音乐的播放时间,无论是在本地还是通过基于服务器的计算,这样用户就可以直接在游戏中使用用户提供的音乐。此外,为了使节奏游戏更加逼真,AutoRhythm允许用户通过任何可以产生敲击声音的物体(如钢笔或筷子敲击桌子)与游戏互动。autorrhythm可以在音乐播放时实时识别打击乐。识别是基于每一帧录音的功率谱,它结合了打击乐和回放音乐。基于12个录音的测试数据集(包含4种类型的2455次敲击),我们的实验表明f值为96.79%,这对于游戏的目的是满意的。能够使用任何用户提供的音乐为游戏的灵活性,并识别用户定义的打击从任何对象可用的手,使游戏的创新和独特的类型。
{"title":"AutoRhythm: A music game with automatic hit-time generation and percussion identification","authors":"P. Chen, Tzu-Chun Yeh, J. Jang, Wenshan Liou","doi":"10.1109/ICME.2015.7177487","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177487","url":null,"abstract":"This paper describes a music rhythm game called AutoRhythm, which can automatically generate the hit time for a rhythm game from a given piece of music, and identify user-defined percussions in real time when a user is playing the game. More specifically, AutoRhythm can automatically generate the hit time of the given music, either locally or via server-based computation, such that users can use the user-supplied music for the game directly. Moreover, to make the rhythm game more realistic, AutoRhythm allows users to interact with the game via any objects that can produce percussion sound, such as a pen or a chopstick hitting on the table. AutoRhythm can identify the percussions in real time while the music is playing. The identification is based on the power spectrum of each frame of the recording which combines percussions and playback music. Based on a test dataset of 12 recordings (with 2455 percussions of 4 types), our experiment indicates an F-measure of 96.79%, which is satisfactory for the purpose of the game. The flexibility of being able to use any user-supplied music for the game and to identify user-defined percussions from any objects available at hand makes the game innovative and unique of its kind.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127987698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Predicting image caption by a unified hierarchical model 用统一的层次模型预测图像标题
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177427
Lin Bai, Kan Li
Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.
在人工智能领域,自动描述图像内容是一项具有挑战性的任务。在活动识别和通过对图像中涉及的活动的关系分析所揭示的图像标题中,困难尤为明显。本文提出了一种统一的层次模型,对人与附近物体之间的交互活动进行建模,然后通过分析交互活动之间的逻辑关系来推测图像内容。在我们的模型中,第一层因子三向交互机对人与相关对象之间的三维空间环境进行建模,直接帮助预测人与对象的交互活动。然后,通过顶层因子三向交互机对活动进行进一步处理,借助活动之间的三维空间语境学习图像内容。在联合数据集上的实验表明,我们的统一层次模型在预测人-物交互活动和描述图像标题方面优于目前最先进的水平。
{"title":"Predicting image caption by a unified hierarchical model","authors":"Lin Bai, Kan Li","doi":"10.1109/ICME.2015.7177427","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177427","url":null,"abstract":"Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131250707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fast Two-Cycle level set tracking with narrow perception of background 快速双周期水平集跟踪与狭窄的背景感知
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177471
Yaochen Li, Yuanqi Su, Yuehu Liu
The problem of tracking foreground objects in a video sequence with moving background remains challenging. In this paper, we propose the Fast Two-Cycle level set method with Narrow band Background (FTCNB) to automatically extract the foreground objects in such video sequences. The level set curve evolution process consists of two successive cycles: one cycle for data dependent term and a second cycle for smoothness regularization. The curve evolution is implemented by computing the signs of region competition terms on two linked lists of contour pixels rather than solving any Partial Differential Equations (PDEs). Maximum A Posterior (MAP) optimization is applied in the FTCNB method for curve refinement with the assistance of optical flows. The comparison with other level set methods demonstrate the tracking accuracy of our method. The tracking speed of the proposed method also outperforms the traditional level set methods.
在具有运动背景的视频序列中,前景目标的跟踪问题仍然具有挑战性。本文提出了基于窄带背景的快速两周期水平集方法(Fast Two-Cycle level set method with Narrow band Background,简称FTCNB)来自动提取此类视频序列中的前景目标。水平集曲线演化过程由两个连续的周期组成:一个周期用于数据相关项,另一个周期用于平滑正则化。曲线演化是通过计算两个轮廓像素链表上的区域竞争项的符号来实现的,而不是求解任何偏微分方程。在FTCNB方法中,利用最大A后验(MAP)优化方法进行光流辅助下的曲线优化。通过与其他水平集方法的比较,验证了该方法的跟踪精度。该方法的跟踪速度也优于传统的水平集方法。
{"title":"Fast Two-Cycle level set tracking with narrow perception of background","authors":"Yaochen Li, Yuanqi Su, Yuehu Liu","doi":"10.1109/ICME.2015.7177471","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177471","url":null,"abstract":"The problem of tracking foreground objects in a video sequence with moving background remains challenging. In this paper, we propose the Fast Two-Cycle level set method with Narrow band Background (FTCNB) to automatically extract the foreground objects in such video sequences. The level set curve evolution process consists of two successive cycles: one cycle for data dependent term and a second cycle for smoothness regularization. The curve evolution is implemented by computing the signs of region competition terms on two linked lists of contour pixels rather than solving any Partial Differential Equations (PDEs). Maximum A Posterior (MAP) optimization is applied in the FTCNB method for curve refinement with the assistance of optical flows. The comparison with other level set methods demonstrate the tracking accuracy of our method. The tracking speed of the proposed method also outperforms the traditional level set methods.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131735725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Early event detection in audio streams 音频流中的早期事件检测
Pub Date : 2015-06-01 DOI: 10.1109/ICME.2015.7177439
Huy Phan, M. Maass, Radoslaw Mazur, A. Mertins
Audio event detection has been an active field of research in recent years. However, most of the proposed methods, if not all, analyze and detect complete events and little attention has been paid for early detection. In this paper, we present a system which enables early audio event detection in continuous audio recordings in which an event can be reliably recognized when only a partial duration is observed. Our evaluation on the ITC-Irst database, one of the standard database of the CLEAR 2006 evaluation, shows that: on one hand, the proposed system outperforms the best baseline system by 16% and 8% in terms of detection error rate and detection accuracy respectively; on the other hand, even partial events are enough to achieve the performance that is obtainable when the whole events are observed.
音频事件检测是近年来研究的一个活跃领域。然而,大多数提出的方法,如果不是全部,分析和检测完整的事件,很少关注早期检测。在本文中,我们提出了一个系统,该系统能够在连续录音中进行早期音频事件检测,其中当仅观察到部分持续时间时,可以可靠地识别事件。我们对clear2006评估标准数据库之一的itc - first数据库进行了评估,结果表明:一方面,所提出的系统在检测错误率和检测准确率方面分别比最佳基线系统高16%和8%;另一方面,即使是局部事件也足以达到观察整个事件时所能达到的性能。
{"title":"Early event detection in audio streams","authors":"Huy Phan, M. Maass, Radoslaw Mazur, A. Mertins","doi":"10.1109/ICME.2015.7177439","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177439","url":null,"abstract":"Audio event detection has been an active field of research in recent years. However, most of the proposed methods, if not all, analyze and detect complete events and little attention has been paid for early detection. In this paper, we present a system which enables early audio event detection in continuous audio recordings in which an event can be reliably recognized when only a partial duration is observed. Our evaluation on the ITC-Irst database, one of the standard database of the CLEAR 2006 evaluation, shows that: on one hand, the proposed system outperforms the best baseline system by 16% and 8% in terms of detection error rate and detection accuracy respectively; on the other hand, even partial events are enough to achieve the performance that is obtainable when the whole events are observed.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133472390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2015 IEEE International Conference on Multimedia and Expo (ICME)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1