Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665047
M. Morbée, L. Tessens, Huang Lee, W. Philips, H. Aghajan
Within a camera network, the contribution of a camera to the observation of a scene depends on its viewpoint and on the scene configuration. This is a dynamic property, as the scene content is subject to change over time. An automatic selection of a subset of cameras that significantly contributes to the desired observation of a scene can be of great value for the reduction of the amount of transmitted or stored image data. In this work, we propose low data rate schemes to select from a vision network a subset of cameras that provides a good frontal observation of the persons in the scene and allows for the best approximation of their 3D shape. We also investigate to what degree low data rates trade off quality of reconstructed 3D shapes.
{"title":"Optimal camera selection in vision networks for shape approximation","authors":"M. Morbée, L. Tessens, Huang Lee, W. Philips, H. Aghajan","doi":"10.1109/MMSP.2008.4665047","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665047","url":null,"abstract":"Within a camera network, the contribution of a camera to the observation of a scene depends on its viewpoint and on the scene configuration. This is a dynamic property, as the scene content is subject to change over time. An automatic selection of a subset of cameras that significantly contributes to the desired observation of a scene can be of great value for the reduction of the amount of transmitted or stored image data. In this work, we propose low data rate schemes to select from a vision network a subset of cameras that provides a good frontal observation of the persons in the scene and allows for the best approximation of their 3D shape. We also investigate to what degree low data rates trade off quality of reconstructed 3D shapes.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134258022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665065
Yan-ning Zhang, Xiao-min Tong, Xiu-wei Zhang, Jiang-bin Zheng, Jun Zhou, Si-wei You
Pedestrian detection plays an important role in automated surveillance system. However, it is challenging to detect pedestrian robustly and accurately in a cluttered environment. In this paper, we propose a new cooperative pedestrian detection method using both colour and thermal image sequences, which is compared with the method using only colour image sequence and that using multi-modal fusion. Experiment results show that our cooperative detection mechanism could get more accurate pedestrian areas, a lower false alarm rate and a higher detection precision. Therefore, it has broad application prospects in the field of industry and military.
{"title":"Pedestrian detection based on multi-modal cooperation","authors":"Yan-ning Zhang, Xiao-min Tong, Xiu-wei Zhang, Jiang-bin Zheng, Jun Zhou, Si-wei You","doi":"10.1109/MMSP.2008.4665065","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665065","url":null,"abstract":"Pedestrian detection plays an important role in automated surveillance system. However, it is challenging to detect pedestrian robustly and accurately in a cluttered environment. In this paper, we propose a new cooperative pedestrian detection method using both colour and thermal image sequences, which is compared with the method using only colour image sequence and that using multi-modal fusion. Experiment results show that our cooperative detection mechanism could get more accurate pedestrian areas, a lower false alarm rate and a higher detection precision. Therefore, it has broad application prospects in the field of industry and military.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124842571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665177
A. Rossholm, B. Lövström
In many applications and environments for mobile communication there is a need for reference free perceptual quality measurements. In this paper a method for prediction of a number of quality metrics is proposed, where the input to the prediction is readily available parameters at the receiver side of a communications channel. Since the parameters are extracted from the coded video bit stream the model can be used in user scenarios where it is normally difficult to estimate the quality due to the reference not being available, as in streaming video and mobile TV applications. The predictor turns out to give good results for both the PSNR and the PEVQ metrics.
{"title":"A new low complex reference free video quality predictor","authors":"A. Rossholm, B. Lövström","doi":"10.1109/MMSP.2008.4665177","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665177","url":null,"abstract":"In many applications and environments for mobile communication there is a need for reference free perceptual quality measurements. In this paper a method for prediction of a number of quality metrics is proposed, where the input to the prediction is readily available parameters at the receiver side of a communications channel. Since the parameters are extracted from the coded video bit stream the model can be used in user scenarios where it is normally difficult to estimate the quality due to the reference not being available, as in streaming video and mobile TV applications. The predictor turns out to give good results for both the PSNR and the PEVQ metrics.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125078572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665153
Zhiyong Wang, W. Siu, D. Feng
Image annotation, which labels an image with a set of semantic terms so as to bridge the semantic gap between low level features and high level semantics in visual information retrieval, is generally posed as a classification problem. Recently, multi-label classification has been investigated for image annotation since an image presents rich contents and can be associated with multiple concepts (i.e. labels). In this paper, a parametric mixture model based multi-class multi-labeling approach is proposed to tackle image annotation. Instead of building classifiers to learn individual labels exclusively, we model images with parametric mixture models so that the mixture characteristics of labels can be simultaneously exploited in both training and annotation processes. Our proposed method has been benchmarked with several state-of-the-art methods and achieved promising results.
{"title":"Image annotation with parametric mixture model based multi-class multi-labeling","authors":"Zhiyong Wang, W. Siu, D. Feng","doi":"10.1109/MMSP.2008.4665153","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665153","url":null,"abstract":"Image annotation, which labels an image with a set of semantic terms so as to bridge the semantic gap between low level features and high level semantics in visual information retrieval, is generally posed as a classification problem. Recently, multi-label classification has been investigated for image annotation since an image presents rich contents and can be associated with multiple concepts (i.e. labels). In this paper, a parametric mixture model based multi-class multi-labeling approach is proposed to tackle image annotation. Instead of building classifiers to learn individual labels exclusively, we model images with parametric mixture models so that the mixture characteristics of labels can be simultaneously exploited in both training and annotation processes. Our proposed method has been benchmarked with several state-of-the-art methods and achieved promising results.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125142511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665179
Werayut Saesue, Jian Zhang, C. Chou
In wireless environments, video quality can be severely degraded due to channel errors. Improving error robustness towards the impact of packet loss in error-prone network is considered as a critical concern in wireless video networking research. Data partitioning (DP) is an efficient error-resilient tool in video codec that is capable of reducing the effect of transmission errors by reorganizing the coded video bitstream into different partitions with different levels of importance. Significant video performance improvement can be achieved if DP is jointly optimized with unequal error protection (UEP). This paper proposes a fast and accurate frame-recursive block-based distortion estimation model for the DP tool in H.264.AVC. The accuracy of our model comes from appropriately approximating the error-concealment cross-correlation term (which is neglected in earlier work in order to reduce computation burden) as a function of the first moment of decoded pixels.Without increasing computation complexity, our proposed distortion model can be applied to both fixed and variable block size intra-prediction and motion compensation. Extensive simulation results are presented to show the accuracy of our estimation algorithm.
{"title":"Hybrid frame-recursive block-based distortion estimation model for wireless video transmission","authors":"Werayut Saesue, Jian Zhang, C. Chou","doi":"10.1109/MMSP.2008.4665179","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665179","url":null,"abstract":"In wireless environments, video quality can be severely degraded due to channel errors. Improving error robustness towards the impact of packet loss in error-prone network is considered as a critical concern in wireless video networking research. Data partitioning (DP) is an efficient error-resilient tool in video codec that is capable of reducing the effect of transmission errors by reorganizing the coded video bitstream into different partitions with different levels of importance. Significant video performance improvement can be achieved if DP is jointly optimized with unequal error protection (UEP). This paper proposes a fast and accurate frame-recursive block-based distortion estimation model for the DP tool in H.264.AVC. The accuracy of our model comes from appropriately approximating the error-concealment cross-correlation term (which is neglected in earlier work in order to reduce computation burden) as a function of the first moment of decoded pixels.Without increasing computation complexity, our proposed distortion model can be applied to both fixed and variable block size intra-prediction and motion compensation. Extensive simulation results are presented to show the accuracy of our estimation algorithm.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127725081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665068
Nikolaos Gkalelis, A. Tefas, I. Pitas
In this paper a novel method for human movement representation and recognition is proposed. A movement type is regarded as a unique combination of basic movement patterns, the so-called dynemes. The fuzzy c-mean (FCM) algorithm is used to identify the dynemes in the input space and allow the expression of a posture in terms of these dynemes. In the so-called dyneme space, the sparse posture representations of a movement are combined to represent the movement as a single point in that space, and linear discriminant analysis (LDA) is further employed to increase movement type discrimination and compactness of representation. This method allows for simple Mahalanobis or cosine distance comparison of movements, taking implicitly into account time shifts and internal speed variations, and, thus, aiding the design of a real-time movement recognition algorithm.
{"title":"Sparse human movement representation and recognition","authors":"Nikolaos Gkalelis, A. Tefas, I. Pitas","doi":"10.1109/MMSP.2008.4665068","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665068","url":null,"abstract":"In this paper a novel method for human movement representation and recognition is proposed. A movement type is regarded as a unique combination of basic movement patterns, the so-called dynemes. The fuzzy c-mean (FCM) algorithm is used to identify the dynemes in the input space and allow the expression of a posture in terms of these dynemes. In the so-called dyneme space, the sparse posture representations of a movement are combined to represent the movement as a single point in that space, and linear discriminant analysis (LDA) is further employed to increase movement type discrimination and compactness of representation. This method allows for simple Mahalanobis or cosine distance comparison of movements, taking implicitly into account time shifts and internal speed variations, and, thus, aiding the design of a real-time movement recognition algorithm.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129586129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665146
Lutz Goldmann, A. Rama, T. Sikora, F. Tarrés
Face analysis is a very active research field, due to its large variety of applications and the different challenges (illumination, pose, expressions or occlusions) the methods need to cope with. Facial occlusions are one of the biggest challenges since they are difficult to model and have a large influence on the performance of subsequent analysis modules. This paper describes a face detection/classification module that allows to detect and localize faces and present occlusions and discusses the use of this additional information within different application scenarios. The approach is evaluated on two databases with realistic occlusions and performs very well for the different detection/classification tasks. It achieves a f-measure of over 97% for face detection and around 86% for component detection. Regarding the occlusion detection, the proposed approach reaches a recognition rate above 91% for both faces and components.
{"title":"On the detection and localization of facial occlusions and its use within different scenarios","authors":"Lutz Goldmann, A. Rama, T. Sikora, F. Tarrés","doi":"10.1109/MMSP.2008.4665146","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665146","url":null,"abstract":"Face analysis is a very active research field, due to its large variety of applications and the different challenges (illumination, pose, expressions or occlusions) the methods need to cope with. Facial occlusions are one of the biggest challenges since they are difficult to model and have a large influence on the performance of subsequent analysis modules. This paper describes a face detection/classification module that allows to detect and localize faces and present occlusions and discusses the use of this additional information within different application scenarios. The approach is evaluated on two databases with realistic occlusions and performs very well for the different detection/classification tasks. It achieves a f-measure of over 97% for face detection and around 86% for component detection. Regarding the occlusion detection, the proposed approach reaches a recognition rate above 91% for both faces and components.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121188234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665091
Shuyuan Zhu, B. Zeng
Multiple-description coding (MDC) provides an effective way to mitigate the effects of packet errors/loses by making use of multiple channels. Perhaps, the most attractive application of MDC is in the peer-to-peer (P2P) scenario to support simultaneous video streaming to a large population of clients. To this end, a number of multiple-description video coding (MDVC) schemes (both non-scalable and scalable) were proposed in the past few years. However, almost all non-scalable schemes would suffer from the prediction mismatch between the references used at the encoder and decoder sides; whereas all scalable schemes (involving a base-layer and some enhancement layers) would suffer from the inter-dependency within the enhancement-layer information. In this paper, we propose a transform-domain MDVC method that can solve these problems and at the same time offer some other interesting features.
{"title":"Seamless MDVC in P2P: A transform-domain approach","authors":"Shuyuan Zhu, B. Zeng","doi":"10.1109/MMSP.2008.4665091","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665091","url":null,"abstract":"Multiple-description coding (MDC) provides an effective way to mitigate the effects of packet errors/loses by making use of multiple channels. Perhaps, the most attractive application of MDC is in the peer-to-peer (P2P) scenario to support simultaneous video streaming to a large population of clients. To this end, a number of multiple-description video coding (MDVC) schemes (both non-scalable and scalable) were proposed in the past few years. However, almost all non-scalable schemes would suffer from the prediction mismatch between the references used at the encoder and decoder sides; whereas all scalable schemes (involving a base-layer and some enhancement layers) would suffer from the inter-dependency within the enhancement-layer information. In this paper, we propose a transform-domain MDVC method that can solve these problems and at the same time offer some other interesting features.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"88 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665195
N. Suvonvorn
An on-line video processing for surveillance system is a very challenging problem. The computational complexity of video analysis algorithms and the massive amount of data to be analyzed must be considered under real-time constraints. Moreover it needs to satisfy different criteria of application domain, such as, scalability, re-configurability, and quality of service. In this paper we propose a flexible/efficient video analysis framework for surveillance system which is a component-based architecture. The video acquisition, re-configurable video analysis, and video storage are some of the basic components. The component execution and inter-components synchronization are designed for supporting the multi-cores and multi-processors architecture with multi-threading implementation on .NET Framework. Experimental results on real-time motion tracking are presented with discussion.
{"title":"A video analysis framework for surveillance system","authors":"N. Suvonvorn","doi":"10.1109/MMSP.2008.4665195","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665195","url":null,"abstract":"An on-line video processing for surveillance system is a very challenging problem. The computational complexity of video analysis algorithms and the massive amount of data to be analyzed must be considered under real-time constraints. Moreover it needs to satisfy different criteria of application domain, such as, scalability, re-configurability, and quality of service. In this paper we propose a flexible/efficient video analysis framework for surveillance system which is a component-based architecture. The video acquisition, re-configurable video analysis, and video storage are some of the basic components. The component execution and inter-components synchronization are designed for supporting the multi-cores and multi-processors architecture with multi-threading implementation on .NET Framework. Experimental results on real-time motion tracking are presented with discussion.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665049
A. Krutz, S. Knorr, M. Kunter, T. Sikora
In recent years advanced video codecs have been developed, such as standardized in MPEG-4. The latest video codec H.264/AVC provides compression performance superior to previous standards, but is based on the same basic motion-compensated-DCT architecture. However, for certain types of video, it has been shown that it is possible to outperform the H.264/AVC using an object-based video codec. Towards a general-purpose object-based video coding system we present an automated approach to separate a video sequences into sub-sequences regarding its camera motion type. Then, the sub-sequences are coded either with an object-based codec or the common H.264/AVC. Applying different video codecs for different kinds of camera motion, we achieve a higher overall coding gain for the video sequence. In first experimental evaluations, we demonstrate the excellence performance of this approach on two test sequences.
{"title":"Camera motion-constraint video codec selection","authors":"A. Krutz, S. Knorr, M. Kunter, T. Sikora","doi":"10.1109/MMSP.2008.4665049","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665049","url":null,"abstract":"In recent years advanced video codecs have been developed, such as standardized in MPEG-4. The latest video codec H.264/AVC provides compression performance superior to previous standards, but is based on the same basic motion-compensated-DCT architecture. However, for certain types of video, it has been shown that it is possible to outperform the H.264/AVC using an object-based video codec. Towards a general-purpose object-based video coding system we present an automated approach to separate a video sequences into sub-sequences regarding its camera motion type. Then, the sub-sequences are coded either with an object-based codec or the common H.264/AVC. Applying different video codecs for different kinds of camera motion, we achieve a higher overall coding gain for the video sequence. In first experimental evaluations, we demonstrate the excellence performance of this approach on two test sequences.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131445379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}