Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177403
Pan Shao, Shouhong Ding, Lizhuang Ma
We present a new edge-preserving image smoothing approach by incorporating local features into a holistic optimization framework. Our method embodies a gradient constraint to enforce detail eliminating and an intensity constraint to achieve shape maintaining. The gradients of high-contrast details are suppressed to a lower magnitude, subsequent to which structural edges can be located. The intensities of a small region are regulated to resemble the initial fabric, which facilitates further detail capture. Experimental results indicate that the proposed algorithm, availed by a sparse gradient counting mechanism, can properly smooth non-edge regions even when textures and structures are similar in scale. The effectiveness of our approach is demonstrated in the context of detail manipulation, edge detection, and image abstraction.
{"title":"Edge-preserving image smoothing with local constraints on gradient and intensity","authors":"Pan Shao, Shouhong Ding, Lizhuang Ma","doi":"10.1109/ICME.2015.7177403","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177403","url":null,"abstract":"We present a new edge-preserving image smoothing approach by incorporating local features into a holistic optimization framework. Our method embodies a gradient constraint to enforce detail eliminating and an intensity constraint to achieve shape maintaining. The gradients of high-contrast details are suppressed to a lower magnitude, subsequent to which structural edges can be located. The intensities of a small region are regulated to resemble the initial fabric, which facilitates further detail capture. Experimental results indicate that the proposed algorithm, availed by a sparse gradient counting mechanism, can properly smooth non-edge regions even when textures and structures are similar in scale. The effectiveness of our approach is demonstrated in the context of detail manipulation, edge detection, and image abstraction.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126304184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177420
Qingyun Liu, Hongtao Xie, Yizhi Liu, Chuang Zhang, Li Guo
Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it divides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method. We first compute the covariance matrix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned binary codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of our method can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%.
{"title":"Data-oriented multi-index hashing","authors":"Qingyun Liu, Hongtao Xie, Yizhi Liu, Chuang Zhang, Li Guo","doi":"10.1109/ICME.2015.7177420","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177420","url":null,"abstract":"Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it divides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method. We first compute the covariance matrix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned binary codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of our method can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"453 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123022594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177400
Jun Huang, Guorong Li, Shuhui Wang, W. Zhang, Qingming Huang
In multi-label classification, labels often have correlations with each other. Exploiting label correlations can improve the performances of classifiers. Current multi-label classification methods mainly consider the global label correlations. However, the label correlations may be different over different data groups. In this paper, we propose a simple and efficient framework for multi-label classification, called Group sensitive Classifier Chains. We assume that similar examples not only share the same label correlations, but also tend to have similar labels. We augment the original feature space with label space and cluster them into groups, then learn the label dependency graph in each group respectively and build the classifier chains on each group specific label dependency graph. The group specific classifier chains which are built on the nearest group of the test example are used for prediction. Comparison results with the state-of-the-art approaches manifest competitive performances of our method.
{"title":"Group sensitive Classifier Chains for multi-label classification","authors":"Jun Huang, Guorong Li, Shuhui Wang, W. Zhang, Qingming Huang","doi":"10.1109/ICME.2015.7177400","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177400","url":null,"abstract":"In multi-label classification, labels often have correlations with each other. Exploiting label correlations can improve the performances of classifiers. Current multi-label classification methods mainly consider the global label correlations. However, the label correlations may be different over different data groups. In this paper, we propose a simple and efficient framework for multi-label classification, called Group sensitive Classifier Chains. We assume that similar examples not only share the same label correlations, but also tend to have similar labels. We augment the original feature space with label space and cluster them into groups, then learn the label dependency graph in each group respectively and build the classifier chains on each group specific label dependency graph. The group specific classifier chains which are built on the nearest group of the test example are used for prediction. Comparison results with the state-of-the-art approaches manifest competitive performances of our method.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130871523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177415
S. Raghuraman, K. Bahirat, B. Prabhakaran
RGB-D cameras have enabled real-time 3D video processing for numerous computer vision applications, especially for surveillance type applications. In this paper, we first present a real-time anti-forensic 3D object stream manipulation framework to capture and manipulate live RBG-D data streams to create realistic images/videos showing individuals performing activities they did not actually do. The framework uses computer vision and graphics methods to render photorealistic animations of live mesh models captured using the camera. Next, we conducted a visual inspection of the manipulated RGB-D streams (just like security personnel would do) by users who are computer vision and graphics scientists. The study shows that it was significantly difficult to distinguish between the real or reconstructed rendering of such 3D video sequences, thus clearly showing the potential security risk involved. Finally, we investigate the efficacy of forensic approaches for detecting such manipulations.
{"title":"Evaluating the efficacy of RGB-D cameras for surveillance","authors":"S. Raghuraman, K. Bahirat, B. Prabhakaran","doi":"10.1109/ICME.2015.7177415","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177415","url":null,"abstract":"RGB-D cameras have enabled real-time 3D video processing for numerous computer vision applications, especially for surveillance type applications. In this paper, we first present a real-time anti-forensic 3D object stream manipulation framework to capture and manipulate live RBG-D data streams to create realistic images/videos showing individuals performing activities they did not actually do. The framework uses computer vision and graphics methods to render photorealistic animations of live mesh models captured using the camera. Next, we conducted a visual inspection of the manipulated RGB-D streams (just like security personnel would do) by users who are computer vision and graphics scientists. The study shows that it was significantly difficult to distinguish between the real or reconstructed rendering of such 3D video sequences, thus clearly showing the potential security risk involved. Finally, we investigate the efficacy of forensic approaches for detecting such manipulations.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130063567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177480
Bo Zhang, Paolo Rota, N. Conci, F. D. Natale
In this paper, we propose a framework to recognize complex human interactions. First, we adopt trajectories to represent human motion in a video. Then, the extracted trajectories are clustered into different groups (named as local motion patterns) using the coherent filtering algorithm. As trajectories within the same group exhibit similar motion properties (i.e., velocity, direction), we adopt the histogram of large-displacement optical flow (denoted as HO-LDOF) as the group motion feature vector. Thus, each video can be briefly represented by a collection of local motion patterns that are described by the HO-LDOF. Finally, classification is achieved using the citation-KNN, which is a typical multiple-instance-learning algorithm. Experimental results on the TV human interaction dataset and the UT human interaction dataset demonstrate the applicability of our method.
{"title":"Human interaction recognition in the wild: Analyzing trajectory clustering from multiple-instance-learning perspective","authors":"Bo Zhang, Paolo Rota, N. Conci, F. D. Natale","doi":"10.1109/ICME.2015.7177480","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177480","url":null,"abstract":"In this paper, we propose a framework to recognize complex human interactions. First, we adopt trajectories to represent human motion in a video. Then, the extracted trajectories are clustered into different groups (named as local motion patterns) using the coherent filtering algorithm. As trajectories within the same group exhibit similar motion properties (i.e., velocity, direction), we adopt the histogram of large-displacement optical flow (denoted as HO-LDOF) as the group motion feature vector. Thus, each video can be briefly represented by a collection of local motion patterns that are described by the HO-LDOF. Finally, classification is achieved using the citation-KNN, which is a typical multiple-instance-learning algorithm. Experimental results on the TV human interaction dataset and the UT human interaction dataset demonstrate the applicability of our method.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128674119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177437
Toru Nakashika, T. Takiguchi, Y. Ariki
In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.
{"title":"Sparse nonlinear representation for voice conversion","authors":"Toru Nakashika, T. Takiguchi, Y. Ariki","doi":"10.1109/ICME.2015.7177437","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177437","url":null,"abstract":"In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127795540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177487
P. Chen, Tzu-Chun Yeh, J. Jang, Wenshan Liou
This paper describes a music rhythm game called AutoRhythm, which can automatically generate the hit time for a rhythm game from a given piece of music, and identify user-defined percussions in real time when a user is playing the game. More specifically, AutoRhythm can automatically generate the hit time of the given music, either locally or via server-based computation, such that users can use the user-supplied music for the game directly. Moreover, to make the rhythm game more realistic, AutoRhythm allows users to interact with the game via any objects that can produce percussion sound, such as a pen or a chopstick hitting on the table. AutoRhythm can identify the percussions in real time while the music is playing. The identification is based on the power spectrum of each frame of the recording which combines percussions and playback music. Based on a test dataset of 12 recordings (with 2455 percussions of 4 types), our experiment indicates an F-measure of 96.79%, which is satisfactory for the purpose of the game. The flexibility of being able to use any user-supplied music for the game and to identify user-defined percussions from any objects available at hand makes the game innovative and unique of its kind.
{"title":"AutoRhythm: A music game with automatic hit-time generation and percussion identification","authors":"P. Chen, Tzu-Chun Yeh, J. Jang, Wenshan Liou","doi":"10.1109/ICME.2015.7177487","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177487","url":null,"abstract":"This paper describes a music rhythm game called AutoRhythm, which can automatically generate the hit time for a rhythm game from a given piece of music, and identify user-defined percussions in real time when a user is playing the game. More specifically, AutoRhythm can automatically generate the hit time of the given music, either locally or via server-based computation, such that users can use the user-supplied music for the game directly. Moreover, to make the rhythm game more realistic, AutoRhythm allows users to interact with the game via any objects that can produce percussion sound, such as a pen or a chopstick hitting on the table. AutoRhythm can identify the percussions in real time while the music is playing. The identification is based on the power spectrum of each frame of the recording which combines percussions and playback music. Based on a test dataset of 12 recordings (with 2455 percussions of 4 types), our experiment indicates an F-measure of 96.79%, which is satisfactory for the purpose of the game. The flexibility of being able to use any user-supplied music for the game and to identify user-defined percussions from any objects available at hand makes the game innovative and unique of its kind.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127987698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177427
Lin Bai, Kan Li
Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.
{"title":"Predicting image caption by a unified hierarchical model","authors":"Lin Bai, Kan Li","doi":"10.1109/ICME.2015.7177427","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177427","url":null,"abstract":"Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131250707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177471
Yaochen Li, Yuanqi Su, Yuehu Liu
The problem of tracking foreground objects in a video sequence with moving background remains challenging. In this paper, we propose the Fast Two-Cycle level set method with Narrow band Background (FTCNB) to automatically extract the foreground objects in such video sequences. The level set curve evolution process consists of two successive cycles: one cycle for data dependent term and a second cycle for smoothness regularization. The curve evolution is implemented by computing the signs of region competition terms on two linked lists of contour pixels rather than solving any Partial Differential Equations (PDEs). Maximum A Posterior (MAP) optimization is applied in the FTCNB method for curve refinement with the assistance of optical flows. The comparison with other level set methods demonstrate the tracking accuracy of our method. The tracking speed of the proposed method also outperforms the traditional level set methods.
在具有运动背景的视频序列中,前景目标的跟踪问题仍然具有挑战性。本文提出了基于窄带背景的快速两周期水平集方法(Fast Two-Cycle level set method with Narrow band Background,简称FTCNB)来自动提取此类视频序列中的前景目标。水平集曲线演化过程由两个连续的周期组成:一个周期用于数据相关项,另一个周期用于平滑正则化。曲线演化是通过计算两个轮廓像素链表上的区域竞争项的符号来实现的,而不是求解任何偏微分方程。在FTCNB方法中,利用最大A后验(MAP)优化方法进行光流辅助下的曲线优化。通过与其他水平集方法的比较,验证了该方法的跟踪精度。该方法的跟踪速度也优于传统的水平集方法。
{"title":"Fast Two-Cycle level set tracking with narrow perception of background","authors":"Yaochen Li, Yuanqi Su, Yuehu Liu","doi":"10.1109/ICME.2015.7177471","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177471","url":null,"abstract":"The problem of tracking foreground objects in a video sequence with moving background remains challenging. In this paper, we propose the Fast Two-Cycle level set method with Narrow band Background (FTCNB) to automatically extract the foreground objects in such video sequences. The level set curve evolution process consists of two successive cycles: one cycle for data dependent term and a second cycle for smoothness regularization. The curve evolution is implemented by computing the signs of region competition terms on two linked lists of contour pixels rather than solving any Partial Differential Equations (PDEs). Maximum A Posterior (MAP) optimization is applied in the FTCNB method for curve refinement with the assistance of optical flows. The comparison with other level set methods demonstrate the tracking accuracy of our method. The tracking speed of the proposed method also outperforms the traditional level set methods.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131735725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/ICME.2015.7177439
Huy Phan, M. Maass, Radoslaw Mazur, A. Mertins
Audio event detection has been an active field of research in recent years. However, most of the proposed methods, if not all, analyze and detect complete events and little attention has been paid for early detection. In this paper, we present a system which enables early audio event detection in continuous audio recordings in which an event can be reliably recognized when only a partial duration is observed. Our evaluation on the ITC-Irst database, one of the standard database of the CLEAR 2006 evaluation, shows that: on one hand, the proposed system outperforms the best baseline system by 16% and 8% in terms of detection error rate and detection accuracy respectively; on the other hand, even partial events are enough to achieve the performance that is obtainable when the whole events are observed.
{"title":"Early event detection in audio streams","authors":"Huy Phan, M. Maass, Radoslaw Mazur, A. Mertins","doi":"10.1109/ICME.2015.7177439","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177439","url":null,"abstract":"Audio event detection has been an active field of research in recent years. However, most of the proposed methods, if not all, analyze and detect complete events and little attention has been paid for early detection. In this paper, we present a system which enables early audio event detection in continuous audio recordings in which an event can be reliably recognized when only a partial duration is observed. Our evaluation on the ITC-Irst database, one of the standard database of the CLEAR 2006 evaluation, shows that: on one hand, the proposed system outperforms the best baseline system by 16% and 8% in terms of detection error rate and detection accuracy respectively; on the other hand, even partial events are enough to achieve the performance that is obtainable when the whole events are observed.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133472390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}