This work proposes a hybrid classifier to recognize humanactions in different contexts. In particular, the proposedhybrid classifier (a neural tree with linear discriminantnodes NTLD), is a neural tree whose nodes can be eithersimple preceptrons or recursive fisher linear discriminant(RFLD) classifiers. A novel technique to substitute badtrained perceptron with more performant linear discriminatorsis introduced. For a given frame, geometrical featuresare extracted from the skeleton of the human blob (silhouette).These geometrical features are collected for a fixednumber of consecutive frames to recognize the correspondingactivity. The resulting feature vector is adopted as inputto the NTLD classifier. The performance of the proposedclassifier has been evaluated on two available databases.
{"title":"Human Action Recognition using a Hybrid NTLD Classifier","authors":"A. Rani, Sanjeev Kumar, C. Micheloni, G. Foresti","doi":"10.1109/AVSS.2010.11","DOIUrl":"https://doi.org/10.1109/AVSS.2010.11","url":null,"abstract":"This work proposes a hybrid classifier to recognize humanactions in different contexts. In particular, the proposedhybrid classifier (a neural tree with linear discriminantnodes NTLD), is a neural tree whose nodes can be eithersimple preceptrons or recursive fisher linear discriminant(RFLD) classifiers. A novel technique to substitute badtrained perceptron with more performant linear discriminatorsis introduced. For a given frame, geometrical featuresare extracted from the skeleton of the human blob (silhouette).These geometrical features are collected for a fixednumber of consecutive frames to recognize the correspondingactivity. The resulting feature vector is adopted as inputto the NTLD classifier. The performance of the proposedclassifier has been evaluated on two available databases.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inter-image homographies are essential for many differenttasks involving projective geometry. This paper proposesan adaptive correspondence estimation approach betweenperson detections in a planar scene not relying oncorrespondence features as it is the case in many otherRANSAC-based approaches. The result is a planar interimagehomography calculated from estimated point correspondences.The approach is self-configurable, adaptiveand provides robustness over time by exploiting temporaland geometric information. We demonstrate the manifoldapplicability of the proposed approach on a variety ofdatasets. Improved results compared to a common baselineapproach are shown and the influence of error sources suchas missed detections, false detections and non overlappingfield of views is investigated.
{"title":"Automatic Inter-image Homography Estimation from Person Detections","authors":"M. Thaler, R. Mörzinger","doi":"10.1109/AVSS.2010.35","DOIUrl":"https://doi.org/10.1109/AVSS.2010.35","url":null,"abstract":"Inter-image homographies are essential for many differenttasks involving projective geometry. This paper proposesan adaptive correspondence estimation approach betweenperson detections in a planar scene not relying oncorrespondence features as it is the case in many otherRANSAC-based approaches. The result is a planar interimagehomography calculated from estimated point correspondences.The approach is self-configurable, adaptiveand provides robustness over time by exploiting temporaland geometric information. We demonstrate the manifoldapplicability of the proposed approach on a variety ofdatasets. Improved results compared to a common baselineapproach are shown and the influence of error sources suchas missed detections, false detections and non overlappingfield of views is investigated.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel approach for discovering directedintention-driven pedestrian activities across large urban areas.The proposed approach is based on a mutual informationco-clustering technique that simultaneously clusterstrajectory start locations in the scene which have similardistributions across stop locations and vice-versa. The clusteringassignments are obtained by minimizing the loss ofmutual information between a trajectory start-stop associationmatrix and a compressed co-clustered matrix, afterwhich the scene activities are inferred from the compressedmatrix. We demonstrate our approach using a dataset oflong duration trajectories from multiple PTZ cameras coveringa large area and show improved results over two otherpopular trajectory clustering and entry-exit learning approaches.
{"title":"Learning Directed Intention-driven Activities using Co-Clustering","authors":"K. Sankaranarayanan, James W. Davis","doi":"10.1109/AVSS.2010.41","DOIUrl":"https://doi.org/10.1109/AVSS.2010.41","url":null,"abstract":"We present a novel approach for discovering directedintention-driven pedestrian activities across large urban areas.The proposed approach is based on a mutual informationco-clustering technique that simultaneously clusterstrajectory start locations in the scene which have similardistributions across stop locations and vice-versa. The clusteringassignments are obtained by minimizing the loss ofmutual information between a trajectory start-stop associationmatrix and a compressed co-clustered matrix, afterwhich the scene activities are inferred from the compressedmatrix. We demonstrate our approach using a dataset oflong duration trajectories from multiple PTZ cameras coveringa large area and show improved results over two otherpopular trajectory clustering and entry-exit learning approaches.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126021083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intelligent surveillance systems typically use a single visualspectrum modality for their input. These systems workwell in controlled conditions, but often fail when lightingis poor, or environmental effects such as shadows, dust orsmoke are present. Thermal spectrum imagery is not as susceptibleto environmental effects, however thermal imagingsensors are more sensitive to noise and they are onlygray scale, making distinguishing between objects difficult.Several approaches to combining the visual and thermalmodalities have been proposed, however they are limited byassuming that both modalities are perfuming equally well.When one modality fails, existing approaches are unable todetect the drop in performance and disregard the under performingmodality. In this paper, a novel middle fusion approachfor combining visual and thermal spectrum imagesfor object tracking is proposed. Motion and object detectionis performed on each modality and the object detectionresults for each modality are fused base on the currentperformance of each modality. Modality performance is determinedby comparing the number of objects tracked by thesystem with the number detected by each mode, with a smallallowance made for objects entering and exiting the scene.The tracking performance of the proposed fusion schemeis compared with performance of the visual and thermalmodes individually, and a baseline middle fusion scheme.Improvement in tracking performance using the proposedfusion approach is demonstrated. The proposed approachis also shown to be able to detect the failure of an individualmodality and disregard its results, ensuring performance isnot degraded in such situations.
{"title":"Multi-Modal Object Tracking using Dynamic Performance Metrics","authors":"S. Denman, C. Fookes, S. Sridharan, D. Ryan","doi":"10.1109/AVSS.2010.16","DOIUrl":"https://doi.org/10.1109/AVSS.2010.16","url":null,"abstract":"Intelligent surveillance systems typically use a single visualspectrum modality for their input. These systems workwell in controlled conditions, but often fail when lightingis poor, or environmental effects such as shadows, dust orsmoke are present. Thermal spectrum imagery is not as susceptibleto environmental effects, however thermal imagingsensors are more sensitive to noise and they are onlygray scale, making distinguishing between objects difficult.Several approaches to combining the visual and thermalmodalities have been proposed, however they are limited byassuming that both modalities are perfuming equally well.When one modality fails, existing approaches are unable todetect the drop in performance and disregard the under performingmodality. In this paper, a novel middle fusion approachfor combining visual and thermal spectrum imagesfor object tracking is proposed. Motion and object detectionis performed on each modality and the object detectionresults for each modality are fused base on the currentperformance of each modality. Modality performance is determinedby comparing the number of objects tracked by thesystem with the number detected by each mode, with a smallallowance made for objects entering and exiting the scene.The tracking performance of the proposed fusion schemeis compared with performance of the visual and thermalmodes individually, and a baseline middle fusion scheme.Improvement in tracking performance using the proposedfusion approach is demonstrated. The proposed approachis also shown to be able to detect the failure of an individualmodality and disregard its results, ensuring performance isnot degraded in such situations.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123891171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we address the task of appearance basedperson reidentification in infrared image sequences. Whilecommon approaches for appearance based person reidentificationin the visible spectrum acquire color histograms ofa person, this technique is not applicable in infrared for obviousreasons. To tackle the more difficult problem of personreidentification in infrared, we introduce an approachthat relies on local image features only and thus is completelyindependent of sensor specific features which mightbe available only in the visible spectrum. Our approachfits into an Implicit Shape Model (ISM) based person detectionand tracking strategy described in previous work.Local features collected during tracking are employed forperson reidentification while the generalizing appearancecodebook used for person detection serves as structuringelement to generate person signatures. By this, we gain anintegrated approach that allows for fast online model generation,a compact representation, and fast model matching.Since the model allows for a joined representation ofappearance and spatial information, no complex representationmodels like graph structures are needed. We evaluateour person reidentification approach on a subset of the CASIAinfrared dataset.
{"title":"Local Feature Based Person Reidentification in Infrared Image Sequences","authors":"K. Jüngling, Michael Arens","doi":"10.1109/AVSS.2010.75","DOIUrl":"https://doi.org/10.1109/AVSS.2010.75","url":null,"abstract":"In this paper, we address the task of appearance basedperson reidentification in infrared image sequences. Whilecommon approaches for appearance based person reidentificationin the visible spectrum acquire color histograms ofa person, this technique is not applicable in infrared for obviousreasons. To tackle the more difficult problem of personreidentification in infrared, we introduce an approachthat relies on local image features only and thus is completelyindependent of sensor specific features which mightbe available only in the visible spectrum. Our approachfits into an Implicit Shape Model (ISM) based person detectionand tracking strategy described in previous work.Local features collected during tracking are employed forperson reidentification while the generalizing appearancecodebook used for person detection serves as structuringelement to generate person signatures. By this, we gain anintegrated approach that allows for fast online model generation,a compact representation, and fast model matching.Since the model allows for a joined representation ofappearance and spatial information, no complex representationmodels like graph structures are needed. We evaluateour person reidentification approach on a subset of the CASIAinfrared dataset.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128948026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seunghan Han, Bonjung Koo, A. Hutter, V. Shet, W. Stechele
In forensic analysis of visual surveillance data, condi-tional knowledge representation and inference under un-certainty play an important role for deriving new contex-tual cues by fusing relevant evidential patterns. To addressthis aspect, both rule-based (aka. extensional) and statebased (aka. intensional) approaches have been adoptedfor situation or visual event analysis. The former providesflexible expressive power and computational efficiency buttypically allows only one directional inference. The latteris computationally expensive but allows bidirectional inter-pretation of conditionals by treating antecedent and conse-quent of conditionals as mutually relevant states. In visualsurveillance, considering the varying semantics and poten-tially ambiguous causality in conditionals, it would be use-ful to combine the expressive power of rule-based systemwith the ability of bidirectional interpretation. In this paper,we propose a hybrid approach that, while relying mainly ona rule-based architecture, also provides an intensional wayof on-demand conditional modeling using conditional op-erators in subjective logic. We first show how conditionalscan be assessed via explicit representation of ignorance insubjective logic. We then describe the proposed hybrid con-ditional handling framework. Finally we present an exper-imental case study from a typical airport scene taken fromvisual surveillance data.
{"title":"Subjective Logic Based Hybrid Approach to Conditional Evidence Fusion for Forensic Visual Surveillance","authors":"Seunghan Han, Bonjung Koo, A. Hutter, V. Shet, W. Stechele","doi":"10.1109/AVSS.2010.19","DOIUrl":"https://doi.org/10.1109/AVSS.2010.19","url":null,"abstract":"In forensic analysis of visual surveillance data, condi-tional knowledge representation and inference under un-certainty play an important role for deriving new contex-tual cues by fusing relevant evidential patterns. To addressthis aspect, both rule-based (aka. extensional) and statebased (aka. intensional) approaches have been adoptedfor situation or visual event analysis. The former providesflexible expressive power and computational efficiency buttypically allows only one directional inference. The latteris computationally expensive but allows bidirectional inter-pretation of conditionals by treating antecedent and conse-quent of conditionals as mutually relevant states. In visualsurveillance, considering the varying semantics and poten-tially ambiguous causality in conditionals, it would be use-ful to combine the expressive power of rule-based systemwith the ability of bidirectional interpretation. In this paper,we propose a hybrid approach that, while relying mainly ona rule-based architecture, also provides an intensional wayof on-demand conditional modeling using conditional op-erators in subjective logic. We first show how conditionalscan be assessed via explicit representation of ignorance insubjective logic. We then describe the proposed hybrid con-ditional handling framework. Finally we present an exper-imental case study from a typical airport scene taken fromvisual surveillance data.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129013624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper a task-oriented approach for object trackingin large distributed camera networks is presented. Thiswork includes three main contributions. First a generic processframework is presented, which has been designed fortask-oriented video processing. Second, system componentsof the task-oriented framework needed for the task of multicameraperson tracking are introduced in detail. Third, foran efficient task-oriented processing in large camera networksthe capability of dynamic sensor scheduling by themulti-camera tracking processes is indispensable. For thispurpose an efficient sensor selection approach is proposed.
{"title":"Task-Oriented Object Tracking in Large Distributed Camera Networks","authors":"Eduardo Monari, K. Kroschel","doi":"10.1109/AVSS.2010.66","DOIUrl":"https://doi.org/10.1109/AVSS.2010.66","url":null,"abstract":"In this paper a task-oriented approach for object trackingin large distributed camera networks is presented. Thiswork includes three main contributions. First a generic processframework is presented, which has been designed fortask-oriented video processing. Second, system componentsof the task-oriented framework needed for the task of multicameraperson tracking are introduced in detail. Third, foran efficient task-oriented processing in large camera networksthe capability of dynamic sensor scheduling by themulti-camera tracking processes is indispensable. For thispurpose an efficient sensor selection approach is proposed.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114621845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuan Hue Thi, Jian Zhang, Li Cheng, Li Wang, S. Satoh
This paper presents a unified framework for human actionclassification and localization in video using structuredlearning of local space-time features. Each human actionclass is represented by a set of its own compact set of localpatches. In our approach, we first use a discriminativehierarchical Bayesian classifier to select those space-timeinterest points that are constructive for each particular action.Those concise local features are then passed to a SupportVector Machine with Principal Component Analysisprojection for the classification task. Meanwhile, the actionlocalization is done using Dynamic Conditional RandomFields developed to incorporate the spatial and temporalstructure constraints of superpixels extracted aroundthose features. Each superpixel in the video is defined by theshape and motion information of its corresponding featureregion. Compelling results obtained from experiments onKTH [22], Weizmann [1], HOHA [13] and TRECVid [23]datasets have proven the efficiency and robustness of ourframework for the task of human action recognition and localizationin video.
{"title":"Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features","authors":"Tuan Hue Thi, Jian Zhang, Li Cheng, Li Wang, S. Satoh","doi":"10.1109/AVSS.2010.76","DOIUrl":"https://doi.org/10.1109/AVSS.2010.76","url":null,"abstract":"This paper presents a unified framework for human actionclassification and localization in video using structuredlearning of local space-time features. Each human actionclass is represented by a set of its own compact set of localpatches. In our approach, we first use a discriminativehierarchical Bayesian classifier to select those space-timeinterest points that are constructive for each particular action.Those concise local features are then passed to a SupportVector Machine with Principal Component Analysisprojection for the classification task. Meanwhile, the actionlocalization is done using Dynamic Conditional RandomFields developed to incorporate the spatial and temporalstructure constraints of superpixels extracted aroundthose features. Each superpixel in the video is defined by theshape and motion information of its corresponding featureregion. Compelling results obtained from experiments onKTH [22], Weizmann [1], HOHA [13] and TRECVid [23]datasets have proven the efficiency and robustness of ourframework for the task of human action recognition and localizationin video.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115239856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human action recognition is often addressed by use oflatent-state models such as the hidden Markov model andsimilar graphical models. As such models requireExpectation-Maximisation training, arbitrary choicesmust be made for training initialisation, with major impacton the final recognition accuracy. In this paper, wepropose a histogram-based deterministic initialisation andcompare it with both random and a time-baseddeterministic initialisations. Experiments on a humanaction dataset show that the accuracy of the proposedmethod proved higher than that of the other testedmethods.
{"title":"Histogram-Based Training Initialisation of Hidden Markov Models for Human Action Recognition","authors":"Z. Moghaddam, M. Piccardi","doi":"10.1109/AVSS.2010.25","DOIUrl":"https://doi.org/10.1109/AVSS.2010.25","url":null,"abstract":"Human action recognition is often addressed by use oflatent-state models such as the hidden Markov model andsimilar graphical models. As such models requireExpectation-Maximisation training, arbitrary choicesmust be made for training initialisation, with major impacton the final recognition accuracy. In this paper, wepropose a histogram-based deterministic initialisation andcompare it with both random and a time-baseddeterministic initialisations. Experiments on a humanaction dataset show that the accuracy of the proposedmethod proved higher than that of the other testedmethods.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"550 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125342061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Calibrated cameras are an extremely useful resource forcomputer vision scenarios. Typically, cameras are calibratedthrough calibration targets, measurements of the observedscene, or self-calibrated through features matchedbetween cameras with overlapping fields of view. This paperconsiders an approach to camera calibration based onobservations of a pedestrian and compares the resultingcalibration to a commonly used approach requiring thatmeasurements be made of the scene.
{"title":"Surveillance Camera Calibration from Observations of a Pedestrian","authors":"M. Evans, J. Ferryman","doi":"10.1109/AVSS.2010.32","DOIUrl":"https://doi.org/10.1109/AVSS.2010.32","url":null,"abstract":"Calibrated cameras are an extremely useful resource forcomputer vision scenarios. Typically, cameras are calibratedthrough calibration targets, measurements of the observedscene, or self-calibrated through features matchedbetween cameras with overlapping fields of view. This paperconsiders an approach to camera calibration based onobservations of a pedestrian and compares the resultingcalibration to a commonly used approach requiring thatmeasurements be made of the scene.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125201451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}