Understanding crowd dynamics is an interesting problem in computer vision owing to its various applications. We propose a dynamical system to model the dynamics of collective motion of the crowd. The model learns the spatio-temporal interaction pattern of the crowd from the track data captured over a time period. The model is trained under a least square formulation with spatial and temporal constraints. The spatial constraint allows the model to consider only the neighbors of a particular agent and the temporal constraint enforces temporal smoothness in the model. We also propose an effective group detection algorithm that utilizes the eigenvectors of the interaction matrix of the model. The group detection is cast as a spectral clustering problem. Extensive experimentation demonstrates a superlative performance of our group detection algorithm over state-of-the-art methods.
{"title":"Crowd motion analysis for group detection","authors":"Neha Bhargava, S. Chaudhuri","doi":"10.1145/3009977.3010071","DOIUrl":"https://doi.org/10.1145/3009977.3010071","url":null,"abstract":"Understanding crowd dynamics is an interesting problem in computer vision owing to its various applications. We propose a dynamical system to model the dynamics of collective motion of the crowd. The model learns the spatio-temporal interaction pattern of the crowd from the track data captured over a time period. The model is trained under a least square formulation with spatial and temporal constraints. The spatial constraint allows the model to consider only the neighbors of a particular agent and the temporal constraint enforces temporal smoothness in the model. We also propose an effective group detection algorithm that utilizes the eigenvectors of the interaction matrix of the model. The group detection is cast as a spectral clustering problem. Extensive experimentation demonstrates a superlative performance of our group detection algorithm over state-of-the-art methods.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"38 1","pages":"21:1-21:6"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80610686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel deep framework, TraCount, for highly overlapping vehicle counting in congested traffic scenes. TraCount uses multiple fully convolutional(FC) sub-networks to predict the density map for a given static image of a traffic scene. The different FC sub-networks provide a range in size of receptive fields that enable us to count vehicles whose perspective effect varies significantly in a scene due to the large visual field of surveillance cameras. The predictions of different FC sub-networks are fused by weighted averaging to obtain a final density map. We show that TraCount outperforms the state of the art methods on the challenging TRANCOS dataset that has a total of 46796 vehicles annotated across 1244 images.
{"title":"TraCount: a deep convolutional neural network for highly overlapping vehicle counting","authors":"Shiv Surya, R. Venkatesh Babu","doi":"10.1145/3009977.3010060","DOIUrl":"https://doi.org/10.1145/3009977.3010060","url":null,"abstract":"We propose a novel deep framework, TraCount, for highly overlapping vehicle counting in congested traffic scenes. TraCount uses multiple fully convolutional(FC) sub-networks to predict the density map for a given static image of a traffic scene. The different FC sub-networks provide a range in size of receptive fields that enable us to count vehicles whose perspective effect varies significantly in a scene due to the large visual field of surveillance cameras. The predictions of different FC sub-networks are fused by weighted averaging to obtain a final density map.\u0000 We show that TraCount outperforms the state of the art methods on the challenging TRANCOS dataset that has a total of 46796 vehicles annotated across 1244 images.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"38 1","pages":"46:1-46:6"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74523642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic detection of TV advertisements is of paramount importance for various media monitoring agencies. Existing works in this domain have mostly focused on news channels using news specific features. Most commercial products use near copy detection algorithms instead of generic advertisement classification. A generic detector needs to handle inter-class and intra-class imbalances present in data due to variability in content aired across channels and frequent repetition of advertisements. Imbalances present in data make classifiers biased towards one of the classes and thus require special treatment. We propose to use tree of perceptrons to solve this problem. The training data available for each perceptron node is balanced using cluster based over-sampling and TOMEK link cleaning as we traverse the tree downwards. The trained perceptron node then passes the original unbalanced data to its children. This process is repeated recursively till we reach the leaf nodes. We call this new algorithm as "Progressively Balanced Perceptron Tree". We have also contributed a TV advertisements dataset consisting of 250 hours of videos recorded from five non-news TV channels of different genres. Experimentations on this dataset have shown that the proposed approach has comparatively superior and balanced performance with respect to six baseline methods. Our proposal generalizes well across channels, with varying training data sizes and achieved a top F1-score of 97% in detecting advertisements.
{"title":"Generic TV advertisement detection using progressively balanced perceptron trees","authors":"Raghvendra Kannao, P. Guha","doi":"10.1145/3009977.3009995","DOIUrl":"https://doi.org/10.1145/3009977.3009995","url":null,"abstract":"Automatic detection of TV advertisements is of paramount importance for various media monitoring agencies. Existing works in this domain have mostly focused on news channels using news specific features. Most commercial products use near copy detection algorithms instead of generic advertisement classification. A generic detector needs to handle inter-class and intra-class imbalances present in data due to variability in content aired across channels and frequent repetition of advertisements. Imbalances present in data make classifiers biased towards one of the classes and thus require special treatment. We propose to use tree of perceptrons to solve this problem. The training data available for each perceptron node is balanced using cluster based over-sampling and TOMEK link cleaning as we traverse the tree downwards. The trained perceptron node then passes the original unbalanced data to its children. This process is repeated recursively till we reach the leaf nodes. We call this new algorithm as \"Progressively Balanced Perceptron Tree\". We have also contributed a TV advertisements dataset consisting of 250 hours of videos recorded from five non-news TV channels of different genres. Experimentations on this dataset have shown that the proposed approach has comparatively superior and balanced performance with respect to six baseline methods. Our proposal generalizes well across channels, with varying training data sizes and achieved a top F1-score of 97% in detecting advertisements.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"15 1","pages":"8:1-8:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74600618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently compressed sensing or compressive sampling (CS), apart from its intrinsic applications of sub-sample signal reconstruction, is explored a lot in the design of bandwidth preserving-energy efficient wireless networks. At the same time, due to open nature of wireless channel, digital data (media) transmission needs their protection from unauthorized access and digital watermarking has been devised as one form of potential solution over the years. Among the various methods, spread spectrum (SS) watermarking is found to be efficient due to its improved robustness and imperceptibility. SS watermarking on digital images in presence of additive and multiplicative noise is studied a lot. To the best of knowledge, CS-SS watermarking in presence of both multiplicative (fading channel) and additive noise is not explored much in the existing literature. To address this problem, a wireless communication theoretic model is suggested here to develop an improved detection scheme on additive SS image watermark framework. System model considers sub-sample (CS) transmission of the watermarked image over both non-fading and fading channel. Then a diversity assisted weighted combining scheme for the improved watermark detection is developed. An optimization problem is formulated where the weight for the individual link is calculated through eigen filter approach to maximize the watermark detection probability for a fixed false alarm rate under the constraint of an embedding power (strength). A large set of simulation results validate the mathematical model of the diversity assisted compressive watermark detector.
{"title":"On improved CS-SS image watermark detection over radio mobile channel","authors":"A. Bose, S. Maity","doi":"10.1145/3009977.3010049","DOIUrl":"https://doi.org/10.1145/3009977.3010049","url":null,"abstract":"Recently compressed sensing or compressive sampling (CS), apart from its intrinsic applications of sub-sample signal reconstruction, is explored a lot in the design of bandwidth preserving-energy efficient wireless networks. At the same time, due to open nature of wireless channel, digital data (media) transmission needs their protection from unauthorized access and digital watermarking has been devised as one form of potential solution over the years. Among the various methods, spread spectrum (SS) watermarking is found to be efficient due to its improved robustness and imperceptibility. SS watermarking on digital images in presence of additive and multiplicative noise is studied a lot. To the best of knowledge, CS-SS watermarking in presence of both multiplicative (fading channel) and additive noise is not explored much in the existing literature. To address this problem, a wireless communication theoretic model is suggested here to develop an improved detection scheme on additive SS image watermark framework. System model considers sub-sample (CS) transmission of the watermarked image over both non-fading and fading channel. Then a diversity assisted weighted combining scheme for the improved watermark detection is developed. An optimization problem is formulated where the weight for the individual link is calculated through eigen filter approach to maximize the watermark detection probability for a fixed false alarm rate under the constraint of an embedding power (strength). A large set of simulation results validate the mathematical model of the diversity assisted compressive watermark detector.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"6 1","pages":"60:1-60:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83817376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, the interest in Micro Aerial Vehicles (MAVs) and their autonomous flights has increased tremendously and significant advances have been made. The monocular camera has turned out to be most popular sensing modality for MAVs as it is light-weight, does not consume more power, and encodes rich information about the environment around. In this paper, we present DeepFly, our framework for autonomous navigation of a quadcopter equipped with monocular camera. The navigable space detection and waypoint selection are fundamental components of autonomous navigation system. They have broader meaning than just detecting and avoiding immediate obstacles. Finding the navigable space emphasizes equally on avoiding obstacles and detecting ideal regions to move next to. The ideal region can be defined by two properties: 1) All the points in the region have approximately same high depth value and 2) The area covered by the points of the region in the disparity map is considerably large. The waypoints selected from these navigable spaces assure collision-free path which is safer than path obtained from other waypoint selection methods which do not consider neighboring information. In our approach, we obtain a dense disparity map by performing a translation maneuver. This disparity map is input to a deep neural network which predicts bounding boxes for multiple navigable regions. Our deep convolutional neural network with shortcut connections regresses variable number of outputs without any complex architectural add on. Our autonomous navigation approach has been successfully tested in both indoors and outdoors environment and in range of lighting conditions.
{"title":"DeepFly: towards complete autonomous navigation of MAVs with monocular camera","authors":"Utsav Shah, Rishabh Khawad, K. Krishna","doi":"10.1145/3009977.3010047","DOIUrl":"https://doi.org/10.1145/3009977.3010047","url":null,"abstract":"Recently, the interest in Micro Aerial Vehicles (MAVs) and their autonomous flights has increased tremendously and significant advances have been made. The monocular camera has turned out to be most popular sensing modality for MAVs as it is light-weight, does not consume more power, and encodes rich information about the environment around. In this paper, we present DeepFly, our framework for autonomous navigation of a quadcopter equipped with monocular camera. The navigable space detection and waypoint selection are fundamental components of autonomous navigation system. They have broader meaning than just detecting and avoiding immediate obstacles. Finding the navigable space emphasizes equally on avoiding obstacles and detecting ideal regions to move next to. The ideal region can be defined by two properties: 1) All the points in the region have approximately same high depth value and 2) The area covered by the points of the region in the disparity map is considerably large. The waypoints selected from these navigable spaces assure collision-free path which is safer than path obtained from other waypoint selection methods which do not consider neighboring information.\u0000 In our approach, we obtain a dense disparity map by performing a translation maneuver. This disparity map is input to a deep neural network which predicts bounding boxes for multiple navigable regions. Our deep convolutional neural network with shortcut connections regresses variable number of outputs without any complex architectural add on. Our autonomous navigation approach has been successfully tested in both indoors and outdoors environment and in range of lighting conditions.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"27 1","pages":"59:1-59:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83063301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brain mapping research is facilitated by first aligning digital images of mouse brain slices to standardized atlas framework such as the Allen Reference Atlas (ARA). However, conventional processing of these brain slices introduces many histological artifacts such as tears and missing regions in the tissue, which make the automatic alignment process extremely challenging. We present an end-to-end fully automatic registration pipeline for alignment of digital images of mouse brain slices that may have histological artifacts, to a standardized atlas space. We use a geometric approach where we first align the bounding box of convex hulls of brain slice contours and atlas template contours, which are extracted using a variant of Canny edge detector. We then detect the artifacts using Constrained Delaunay Triangulation (CDT) and remove them from the contours before performing global alignment of points using iterative closest point (ICP). This is followed by a final non-linear registration by solving the Laplace's equation with Dirichlet boundary conditions. We tested our algorithm on 200 mouse brain slice images including slices acquired from conventional processing techniques having major histological artifacts, and from serial two-photon tomography (STPT) with no major artifacts. We show significant improvement over other registration techniques, both qualitatively and quantitatively, in all slices especially on slices with significant histological artifacts.
{"title":"Robust registration of Mouse brain slices with severe histological artifacts","authors":"Nitin Agarwal, Xiangmin Xu, M. Gopi","doi":"10.1145/3009977.3010053","DOIUrl":"https://doi.org/10.1145/3009977.3010053","url":null,"abstract":"Brain mapping research is facilitated by first aligning digital images of mouse brain slices to standardized atlas framework such as the Allen Reference Atlas (ARA). However, conventional processing of these brain slices introduces many histological artifacts such as tears and missing regions in the tissue, which make the automatic alignment process extremely challenging. We present an end-to-end fully automatic registration pipeline for alignment of digital images of mouse brain slices that may have histological artifacts, to a standardized atlas space. We use a geometric approach where we first align the bounding box of convex hulls of brain slice contours and atlas template contours, which are extracted using a variant of Canny edge detector. We then detect the artifacts using Constrained Delaunay Triangulation (CDT) and remove them from the contours before performing global alignment of points using iterative closest point (ICP). This is followed by a final non-linear registration by solving the Laplace's equation with Dirichlet boundary conditions. We tested our algorithm on 200 mouse brain slice images including slices acquired from conventional processing techniques having major histological artifacts, and from serial two-photon tomography (STPT) with no major artifacts. We show significant improvement over other registration techniques, both qualitatively and quantitatively, in all slices especially on slices with significant histological artifacts.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"36 1","pages":"10:1-10:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83158190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Double JPEG problem in image forensics has been gaining importance since it involves two compression cycles and there is a possibility of tampering having taken place after the first cycle thereby calling for accurate methods to detect and localize the introduced tamper. First quantization matrix estimation which basically retrieves the missing quantization table of the first cycle is one of the ways of image authentication for Double JPEG images. This paper presents a robust method for first quantization matrix estimation in case of double compressed JPEG images by improving the selection strategy which chooses the quantization estimate from the filtered DCT histograms. The selection strategy is made robust by increasing the available statistics utilizing the DCT coefficients from the double compressed image under investigation coupled with performing relative comparison between the obtained histograms followed by a novel priority assignment and selection step, which accurately estimates the first quantization value. Experimental testing and comparative analysis with two state-of-art methods show the robustness of the proposed method for accurate first quantization estimation. The proposed method finds its application in image forensics as well as in steganalysis.
{"title":"First quantization matrix estimation for double compressed JPEG images utilizing novel DCT histogram selection strategy","authors":"N. Dalmia, M. Okade","doi":"10.1145/3009977.3010067","DOIUrl":"https://doi.org/10.1145/3009977.3010067","url":null,"abstract":"The Double JPEG problem in image forensics has been gaining importance since it involves two compression cycles and there is a possibility of tampering having taken place after the first cycle thereby calling for accurate methods to detect and localize the introduced tamper. First quantization matrix estimation which basically retrieves the missing quantization table of the first cycle is one of the ways of image authentication for Double JPEG images. This paper presents a robust method for first quantization matrix estimation in case of double compressed JPEG images by improving the selection strategy which chooses the quantization estimate from the filtered DCT histograms. The selection strategy is made robust by increasing the available statistics utilizing the DCT coefficients from the double compressed image under investigation coupled with performing relative comparison between the obtained histograms followed by a novel priority assignment and selection step, which accurately estimates the first quantization value. Experimental testing and comparative analysis with two state-of-art methods show the robustness of the proposed method for accurate first quantization estimation. The proposed method finds its application in image forensics as well as in steganalysis.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"1 1","pages":"27:1-27:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90108629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Object recognition is one of the challenging tasks in computer vision and the problem becomes increasingly difficult when the image categories are visually correlated among themselves i.e. they are visually similar and only fine differences exist among the categories. This paper has a two-fold objective which involves organization of the image categories in a hierarchical tree like structure using self tuning spectral clustering for exploiting the correlations among them. The organization phase is followed by a node specific large margin nearest neighbor classification scheme, where a Mahalnobis distance metric is learnt for each non-leaf node. Further a procedure for hyperparameters selection has been discussed w.r.t two strategies i.e. grid search and Bayesian optimization. The proposed algorithm's effectiveness is tested on selected classes of the popular Imagenet dataset.
{"title":"Hierarchical spectral clustering based large margin classification of visually correlated categories","authors":"Digbalay Bose, S. Chaudhuri","doi":"10.1145/3009977.3010064","DOIUrl":"https://doi.org/10.1145/3009977.3010064","url":null,"abstract":"Object recognition is one of the challenging tasks in computer vision and the problem becomes increasingly difficult when the image categories are visually correlated among themselves i.e. they are visually similar and only fine differences exist among the categories. This paper has a two-fold objective which involves organization of the image categories in a hierarchical tree like structure using self tuning spectral clustering for exploiting the correlations among them. The organization phase is followed by a node specific large margin nearest neighbor classification scheme, where a Mahalnobis distance metric is learnt for each non-leaf node. Further a procedure for hyperparameters selection has been discussed w.r.t two strategies i.e. grid search and Bayesian optimization. The proposed algorithm's effectiveness is tested on selected classes of the popular Imagenet dataset.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"160 1","pages":"48:1-48:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80104390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a method for segmentation of nuclei of single/isolated and overlapping/touching immature white blood cells from microscopic images of B-Lineage acute lymphoblastic leukemia (ALL) prepared from peripheral blood and bone marrow aspirate. We propose deep belief network approach for the segmentation of these nuclei. Simulation results and comparison with some of the existing methods demonstrate the efficacy of the proposed method.
{"title":"Overlapping cell nuclei segmentation in microscopic images using deep belief networks","authors":"Rahul Duggal, Anubha Gupta, Ritu Gupta, Manya Wadhwa, Chirag Ahuja","doi":"10.1145/3009977.3010043","DOIUrl":"https://doi.org/10.1145/3009977.3010043","url":null,"abstract":"This paper proposes a method for segmentation of nuclei of single/isolated and overlapping/touching immature white blood cells from microscopic images of B-Lineage acute lymphoblastic leukemia (ALL) prepared from peripheral blood and bone marrow aspirate. We propose deep belief network approach for the segmentation of these nuclei. Simulation results and comparison with some of the existing methods demonstrate the efficacy of the proposed method.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"54 1","pages":"82:1-82:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77035065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes an approach for event recognition in Egocentric videos using dense trajectories over Gradient Flow - Space Time Interest Point (GF-STIP) feature. We focus on recognizing events of diverse categories (including indoor and outdoor activities, sports and social activities and adventures) in egocentric videos. We introduce a dataset with diverse egocentric events, as all the existing egocentric activity recognition datasets consist of indoor videos only. The dataset introduced in this paper contains 102 videos with 9 different events (containing indoor and outdoor videos with varying lighting conditions). We extract Space Time Interest Points (STIP) from each frame of the video. The interest points are taken as the lead pixels and Gradient-Weighted Optical Flow (GWOF) features are calculated on the lead pixels by multiplying the optical flow measure and the magnitude of gradient at the pixel, to obtain the GF-STIP feature. We construct pose descriptors with the GF-STIP feature. We use the GF-STIP descriptors for recognizing events in egocentric videos with three different approaches: following a Bag of Words (BoW) model, implementing Fisher Vectors and obtaining dense trajectories for the videos. We show that the dense trajectory features based on the proposed GF-STIP descriptors enhance the efficacy of the event recognition system in egocentric videos.
{"title":"Event recognition in egocentric videos using a novel trajectory based feature","authors":"Vinodh Buddubariki, Sunitha Gowd Tulluri, Snehasis Mukherjee","doi":"10.1145/3009977.3010011","DOIUrl":"https://doi.org/10.1145/3009977.3010011","url":null,"abstract":"This paper proposes an approach for event recognition in Egocentric videos using dense trajectories over Gradient Flow - Space Time Interest Point (GF-STIP) feature. We focus on recognizing events of diverse categories (including indoor and outdoor activities, sports and social activities and adventures) in egocentric videos. We introduce a dataset with diverse egocentric events, as all the existing egocentric activity recognition datasets consist of indoor videos only. The dataset introduced in this paper contains 102 videos with 9 different events (containing indoor and outdoor videos with varying lighting conditions). We extract Space Time Interest Points (STIP) from each frame of the video. The interest points are taken as the lead pixels and Gradient-Weighted Optical Flow (GWOF) features are calculated on the lead pixels by multiplying the optical flow measure and the magnitude of gradient at the pixel, to obtain the GF-STIP feature. We construct pose descriptors with the GF-STIP feature. We use the GF-STIP descriptors for recognizing events in egocentric videos with three different approaches: following a Bag of Words (BoW) model, implementing Fisher Vectors and obtaining dense trajectories for the videos. We show that the dense trajectory features based on the proposed GF-STIP descriptors enhance the efficacy of the event recognition system in egocentric videos.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"82 1","pages":"76:1-76:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83921090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}