Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093780
Hsuan-Ying Chen, Jin-Jang Leou
In this study, a learning-based video super-resolution (SR) reconstruction approach using particle swarm optimization (PSO) is proposed. First, a 5×5×5 motion-compensated volume containing five 5×5 motion-compensated patches is extracted and the orientation of the volume is determined for each pixel in the “central” reference low-resolution (LR) video frame. Then, the pixel values of the “central” reference high-resolution (HR) video frame are reconstructed by using the corresponding SR reconstruction filtering masks, based on the orientation of the volume and the coordinates of the pixels to be reconstructed. To simplify the PSO learning processes for determining the weights in SR reconstruction filtering masks, simple mask flippings are employed. Based on the experimental results obtained in this study, the SR reconstruction results of the proposed approach are better than those of three comparison approaches, whereas the computational complexity of the proposed approach is higher than those of two “simple” comparison approaches, NN and Bicubic, and lower than that of the recent comparison approach, modified NLM.
{"title":"Learning-based video super-resolution reconstruction using particle swarm optimization","authors":"Hsuan-Ying Chen, Jin-Jang Leou","doi":"10.1109/MMSP.2011.6093780","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093780","url":null,"abstract":"In this study, a learning-based video super-resolution (SR) reconstruction approach using particle swarm optimization (PSO) is proposed. First, a 5×5×5 motion-compensated volume containing five 5×5 motion-compensated patches is extracted and the orientation of the volume is determined for each pixel in the “central” reference low-resolution (LR) video frame. Then, the pixel values of the “central” reference high-resolution (HR) video frame are reconstructed by using the corresponding SR reconstruction filtering masks, based on the orientation of the volume and the coordinates of the pixels to be reconstructed. To simplify the PSO learning processes for determining the weights in SR reconstruction filtering masks, simple mask flippings are employed. Based on the experimental results obtained in this study, the SR reconstruction results of the proposed approach are better than those of three comparison approaches, whereas the computational complexity of the proposed approach is higher than those of two “simple” comparison approaches, NN and Bicubic, and lower than that of the recent comparison approach, modified NLM.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132347644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093845
Chih-Chung Hsu, Chia-Wen Lin
State-of-the-art image super-resolution methods usually rely on search in a comprehensive dataset for appropriate high-resolution patch candidates to achieve good visual quality of reconstructed image. Exploiting different scales and orientations in images can effectively enrich a dataset. A large dataset, however, usually leads to high computational complexity and memory requirement, which makes the implementation impractical. This paper proposes a universal framework for enriching the dataset for search-based super-resolution schemes with reasonable computation and memory cost. Toward this end, the proposed method first extracts important features with multiple scales and orientations of patches based on the SIFT (Scale-invariant feature transform) descriptors and then use the extracted features to search in the dataset for the best-match HR patch(es). Once the matched features of patches are found, the found HR patch will be aligned with LR patch using homography estimation. Experimental results demonstrate that the proposed method achieves significant subjective and objective improvement when integrated with several state-of-the-art image super-resolution methods without significantly increasing the cost.
{"title":"Image super-resolution via feature-based affine transform","authors":"Chih-Chung Hsu, Chia-Wen Lin","doi":"10.1109/MMSP.2011.6093845","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093845","url":null,"abstract":"State-of-the-art image super-resolution methods usually rely on search in a comprehensive dataset for appropriate high-resolution patch candidates to achieve good visual quality of reconstructed image. Exploiting different scales and orientations in images can effectively enrich a dataset. A large dataset, however, usually leads to high computational complexity and memory requirement, which makes the implementation impractical. This paper proposes a universal framework for enriching the dataset for search-based super-resolution schemes with reasonable computation and memory cost. Toward this end, the proposed method first extracts important features with multiple scales and orientations of patches based on the SIFT (Scale-invariant feature transform) descriptors and then use the extracted features to search in the dataset for the best-match HR patch(es). Once the matched features of patches are found, the found HR patch will be aligned with LR patch using homography estimation. Experimental results demonstrate that the proposed method achieves significant subjective and objective improvement when integrated with several state-of-the-art image super-resolution methods without significantly increasing the cost.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114568359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093802
Ning Zhang, Tao Mei, Xiansheng Hua, L. Guan, Shipeng Li
Mobile visual search has been an emerging topic for both researching and industrial communities. Among various methods, visual search has its merit in providing an alternative solution, where text and voice searches are not applicable. This paper proposes an interactive “tap-to-search” approach utilizing both individual's intention in selecting interested regions via “tap” actions on the mobile touch screen, as well as a visual recognition by search mechanism in a large-scale image database. Automatic image segmentation technique is applied in order to provide region candidates. Visual vocabulary tree based search is adopted by incorporating rich contextual information which are collected from mobile sensors. The proposed approach has been conducted on an image dataset with the scale of two million. We demonstrated that using GPS contextual information, such an approach can further achieve satisfactory results with the standard information retrieval evaluation.
{"title":"Tap-to-search: Interactive and contextual visual search on mobile devices","authors":"Ning Zhang, Tao Mei, Xiansheng Hua, L. Guan, Shipeng Li","doi":"10.1109/MMSP.2011.6093802","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093802","url":null,"abstract":"Mobile visual search has been an emerging topic for both researching and industrial communities. Among various methods, visual search has its merit in providing an alternative solution, where text and voice searches are not applicable. This paper proposes an interactive “tap-to-search” approach utilizing both individual's intention in selecting interested regions via “tap” actions on the mobile touch screen, as well as a visual recognition by search mechanism in a large-scale image database. Automatic image segmentation technique is applied in order to provide region candidates. Visual vocabulary tree based search is adopted by incorporating rich contextual information which are collected from mobile sensors. The proposed approach has been conducted on an image dataset with the scale of two million. We demonstrated that using GPS contextual information, such an approach can further achieve satisfactory results with the standard information retrieval evaluation.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126911067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093844
Hongda Tian, W. Li, P. Ogunbona, D. Nguyen, Ce Zhan
This paper presents a novel and low complexity method for real-time video-based smoke detection. As a local texture operator, Non-Redundant Local Binary Pattern (NRLBP) is more discriminative and robust to illumination changes in comparison with original Local Binary Pattern (LBP), thus is employed to encode the appearance information of smoke. Non-Redundant Local Motion Binary Pattern (NRLMBP), which is computed on the difference image of consecutive frames, is introduced to capture the motion information of smoke. Experimental results show that NRLBP outperforms the original LBP in the smoke detection task. Furthermore, the combination of NRLBP and NRLMBP, which can be considered as a spatial-temporal descriptor of smoke, can lead to remarkable improvement on detection performance.
本文提出了一种新颖、低复杂度的实时视频烟雾检测方法。非冗余局部二值模式(non - redundancy local Binary Pattern, NRLBP)作为一种局部纹理算子,与原始局部二值模式(local Binary Pattern, LBP)相比,对光照变化具有更强的识别力和鲁棒性,因此被用于对烟雾的外观信息进行编码。引入基于连续帧差分图像计算的非冗余局部运动二值模式(NRLMBP)来捕获烟雾的运动信息。实验结果表明,NRLBP算法在烟雾检测任务中的性能优于原始LBP算法。此外,NRLBP和NRLMBP的结合可以作为烟雾的时空描述符,可以显著提高检测性能。
{"title":"Smoke detection in videos using Non-Redundant Local Binary Pattern-based features","authors":"Hongda Tian, W. Li, P. Ogunbona, D. Nguyen, Ce Zhan","doi":"10.1109/MMSP.2011.6093844","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093844","url":null,"abstract":"This paper presents a novel and low complexity method for real-time video-based smoke detection. As a local texture operator, Non-Redundant Local Binary Pattern (NRLBP) is more discriminative and robust to illumination changes in comparison with original Local Binary Pattern (LBP), thus is employed to encode the appearance information of smoke. Non-Redundant Local Motion Binary Pattern (NRLMBP), which is computed on the difference image of consecutive frames, is introduced to capture the motion information of smoke. Experimental results show that NRLBP outperforms the original LBP in the smoke detection task. Furthermore, the combination of NRLBP and NRLMBP, which can be considered as a spatial-temporal descriptor of smoke, can lead to remarkable improvement on detection performance.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121627923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093846
R. Gaetano, B. Pesquet-Popescu
With the raise of cloud computing infrastructures on one side and the increased accessibility of parallel computational devices on the other, such as GPUs and multi-core CPUs, parallel programming has recently gained a renewed interest. This is particularly true in the domain of video coding, where the complexity and time consumption of the algorithms tend to limit the access to the core technology. In this work, we focus on the motion estimation problem, well-known to be the most time consuming step of a majority of video coding techniques. By relying on the use of the OpenCL standard, which provides a cross-platform framework for parallel programming, we propose here a scalable CPU/GPU implementation of the full search motion estimation algorithm (FSBM), and study its performances also with respect to the issues raised by the use of OpenCL.
{"title":"OpenCL implementation of motion estimation for cloud video processing","authors":"R. Gaetano, B. Pesquet-Popescu","doi":"10.1109/MMSP.2011.6093846","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093846","url":null,"abstract":"With the raise of cloud computing infrastructures on one side and the increased accessibility of parallel computational devices on the other, such as GPUs and multi-core CPUs, parallel programming has recently gained a renewed interest. This is particularly true in the domain of video coding, where the complexity and time consumption of the algorithms tend to limit the access to the core technology. In this work, we focus on the motion estimation problem, well-known to be the most time consuming step of a majority of video coding techniques. By relying on the use of the OpenCL standard, which provides a cross-platform framework for parallel programming, we propose here a scalable CPU/GPU implementation of the full search motion estimation algorithm (FSBM), and study its performances also with respect to the issues raised by the use of OpenCL.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121711916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093822
Bo Guan, Yifeng He
Cognitive Radio (CR) is a new paradigm in wireless communications to enhance utilization of limited spectrum resources. In this paper, we investigate the resource allocation problem for video streaming over spectrum underlay cognitive radio networks where secondary users and primary users transmit data simultaneously in a common frequency band. We formulate the resource allocation problem into an optimization problem, which jointly optimizes the source rate, the transmission rate, and the transmission power at each secondary session to provide QoS guarantee to the video streaming sessions. The optimization problem is formulated into a Geometric Programming (GP) problem, which can be solved efficiently. In the simulations, we demonstrate that the proposed scheme can achieve a lower Packet Loss Rate (PLR) and queuing delay, thus leading to a higher video quality for the video streaming sessions, compared to the uniform scheme.
{"title":"Optimal resource allocation for video streaming over cognitive radio networks","authors":"Bo Guan, Yifeng He","doi":"10.1109/MMSP.2011.6093822","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093822","url":null,"abstract":"Cognitive Radio (CR) is a new paradigm in wireless communications to enhance utilization of limited spectrum resources. In this paper, we investigate the resource allocation problem for video streaming over spectrum underlay cognitive radio networks where secondary users and primary users transmit data simultaneously in a common frequency band. We formulate the resource allocation problem into an optimization problem, which jointly optimizes the source rate, the transmission rate, and the transmission power at each secondary session to provide QoS guarantee to the video streaming sessions. The optimization problem is formulated into a Geometric Programming (GP) problem, which can be solved efficiently. In the simulations, we demonstrate that the proposed scheme can achieve a lower Packet Loss Rate (PLR) and queuing delay, thus leading to a higher video quality for the video streaming sessions, compared to the uniform scheme.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130324267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093810
Gene Cheung, Woo-Shik Kim, Antonio Ortega, Junichi Ishida, Akira Kubota
Depth map compression is important for compact “texture-plus-depth” representation of a 3D scene, where texture and depth maps captured from multiple camera viewpoints are coded into the same format. Having received such format, the decoder can synthesize any novel intermediate view using texture and depth maps of two neighboring captured views via depth-image-based rendering (DIBR). In this paper, we combine two previously proposed depth map compression techniques that promote sparsity in the transform domain for coding gain-graph-based transform (GBT) and transform domain sparsification (TDS) — together under one unified optimization framework. The key to combining GBT and TDS is to adaptively select the simplest transform per block that leads to a sparse representation. For blocks without detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is low, and TDS can effectively identify a sparse depth signal in fixed DCT domain within a large search space of good signals with small synthesized view distortion. For blocks with detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is high, and the search space of good depth signals for TDS to find sparse representations in DCT domain is small. In this case, GBT is first performed on a graph defining all detected edges, so that filtering across edges is avoided, resulting in a sparsity count ρ in GBT. We then incrementally add the most important edge to an initial no-edge graph, each time performing TDS in the resulting GBT domain, until the same sparsity count ρ is achieved. Experimentation on two sets of multiview images showed gain of up to 0.7dB in PSNR in synthesized view quality compared to previous techniques that employ either GBT or TDS alone.
{"title":"Depth map coding using graph based transform and transform domain sparsification","authors":"Gene Cheung, Woo-Shik Kim, Antonio Ortega, Junichi Ishida, Akira Kubota","doi":"10.1109/MMSP.2011.6093810","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093810","url":null,"abstract":"Depth map compression is important for compact “texture-plus-depth” representation of a 3D scene, where texture and depth maps captured from multiple camera viewpoints are coded into the same format. Having received such format, the decoder can synthesize any novel intermediate view using texture and depth maps of two neighboring captured views via depth-image-based rendering (DIBR). In this paper, we combine two previously proposed depth map compression techniques that promote sparsity in the transform domain for coding gain-graph-based transform (GBT) and transform domain sparsification (TDS) — together under one unified optimization framework. The key to combining GBT and TDS is to adaptively select the simplest transform per block that leads to a sparse representation. For blocks without detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is low, and TDS can effectively identify a sparse depth signal in fixed DCT domain within a large search space of good signals with small synthesized view distortion. For blocks with detected prominent edges, the synthesized view's distortion sensitivity to depth map errors is high, and the search space of good depth signals for TDS to find sparse representations in DCT domain is small. In this case, GBT is first performed on a graph defining all detected edges, so that filtering across edges is avoided, resulting in a sparsity count ρ in GBT. We then incrementally add the most important edge to an initial no-edge graph, each time performing TDS in the resulting GBT domain, until the same sparsity count ρ is achieved. Experimentation on two sets of multiview images showed gain of up to 0.7dB in PSNR in synthesized view quality compared to previous techniques that employ either GBT or TDS alone.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130661250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093785
Rafael Rodríguez-Sánchez, José Luis Martínez, G. Fernández-Escribano, J. M. Claver, J. L. Sánchez
H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of speedup) while maintaining the coding efficiency.
{"title":"Reducing complexity in H.264/AVC motion estimation by using a GPU","authors":"Rafael Rodríguez-Sánchez, José Luis Martínez, G. Fernández-Escribano, J. M. Claver, J. L. Sánchez","doi":"10.1109/MMSP.2011.6093785","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093785","url":null,"abstract":"H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of speedup) while maintaining the coding efficiency.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131209112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093770
Bin Li, Jizheng Xu, Houqiang Li, Feng Wu
We investigate how to improve video coding efficiency via optimized reference frame selection using large-scale computation resources, e.g., a cloud. We first formulate the optimization problem for reference frame selection in video coding, which can be simplified to a manageable level. Given the maximum number of reference frames for encoding one frame, we give the upper bound of the coding efficiency on the High Efficiency Video Coding (HEVC) platform, which, although ideal, may require a huge amount of reference frame buffering at the decoder. Then we give a solution and the corresponding performance when the reference frame buffer size at the decoder is constrained. Experimental results show that when the number of reference frames is four, the proposed encoding scheme can achieve up to 16.9% bit-saving compared to HEVC, the state-of-the-art video coding system. The proposed encoding scheme is standard-compliant and can also be applied to H.264/AVC to improve coding efficiency.
{"title":"Optimized reference frame selection for video coding by cloud","authors":"Bin Li, Jizheng Xu, Houqiang Li, Feng Wu","doi":"10.1109/MMSP.2011.6093770","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093770","url":null,"abstract":"We investigate how to improve video coding efficiency via optimized reference frame selection using large-scale computation resources, e.g., a cloud. We first formulate the optimization problem for reference frame selection in video coding, which can be simplified to a manageable level. Given the maximum number of reference frames for encoding one frame, we give the upper bound of the coding efficiency on the High Efficiency Video Coding (HEVC) platform, which, although ideal, may require a huge amount of reference frame buffering at the decoder. Then we give a solution and the corresponding performance when the reference frame buffer size at the decoder is constrained. Experimental results show that when the number of reference frames is four, the proposed encoding scheme can achieve up to 16.9% bit-saving compared to HEVC, the state-of-the-art video coding system. The proposed encoding scheme is standard-compliant and can also be applied to H.264/AVC to improve coding efficiency.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133567274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/MMSP.2011.6093798
S. Ness, Alexander Lerch, G. Tzanetakis
The Orchive is a large audio archive of hydrophone recordings of Killer whale (Orcinus orca) vocalizations. Researchers and users from around the world can interact with the archive using a collaborative web-based annotation, visualization and retrieval interface. In addition a mobile client has been written in order to crowdsource Orca call annotation. In this paper we describe and compare different strategies for the retrieval of discrete Orca calls. In addition, the results of the automatic analysis are integrated in the user interface facilitating annotation as well as leveraging the existing annotations for supervised learning. The best strategy achieves a mean average precision of 0.77 with the first retrieved item being relevant 95% of the time in a dataset of 185 calls belonging to 4 types.
{"title":"Strategies for orca call retrieval to support collaborative annotation of a large archive","authors":"S. Ness, Alexander Lerch, G. Tzanetakis","doi":"10.1109/MMSP.2011.6093798","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093798","url":null,"abstract":"The Orchive is a large audio archive of hydrophone recordings of Killer whale (Orcinus orca) vocalizations. Researchers and users from around the world can interact with the archive using a collaborative web-based annotation, visualization and retrieval interface. In addition a mobile client has been written in order to crowdsource Orca call annotation. In this paper we describe and compare different strategies for the retrieval of discrete Orca calls. In addition, the results of the automatic analysis are integrated in the user interface facilitating annotation as well as leveraging the existing annotations for supervised learning. The best strategy achieves a mean average precision of 0.77 with the first retrieved item being relevant 95% of the time in a dataset of 185 calls belonging to 4 types.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114114901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}