Large scale machine learning is becoming an active research area recently. Most of the existing clustering algorithms cannot handle big data due to its high time and space complexity. Among the clustering algorithms, eigen vector based clustering, such as Spectral clustering, shows very good accuracy, but it has cubic time complexity. There are various methods proposed to reduce the time and space complexity for eigen decomposition such as Nyström method, Lanc-zos method etc. Nyström method has linear time complexity in terms of number of data points, but has cubic time complexity in terms of number of sampling points. To reduce this, various Rank k approximation methods also proposed, but which are less efficient compare to the normalized spectral clustering. In this paper we propose a two step algorithm for spectral clustering to reduce the time complexity toO(nmk + m2k'), by combining both Nyström and Lanczos method, where k is the number of clusters and k' is the rank k approximation of the sampling matrix (k < k' << m << n). It shows very good results, with various data sets, image segmentation problems and churn prediction of a telecommunication data set, even with very low sampling (for 10 Million × 10 Million matrix, sampled only 100 columns) with lesser time, which confirms the validity of the algorithm.
近年来,大规模机器学习正在成为一个活跃的研究领域。由于大数据的时间和空间复杂度高,现有的聚类算法大多无法处理大数据。在聚类算法中,基于特征向量的聚类,如谱聚类,具有很好的精度,但具有三次时间复杂度。为了降低特征分解的时间和空间复杂度,人们提出了多种方法,如Nyström方法、lanco -zos方法等。Nyström方法在数据点数量上具有线性时间复杂度,但在采样点数量上具有三次时间复杂度。为了减少这种情况,也提出了各种秩k近似方法,但与归一化谱聚类相比效率较低。在本文中,我们提出一种两步谱聚类算法来减少时间复杂度也(nmk + m2k”),通过结合Nystrom和兰索斯法、k是集群的数量和k的采样矩阵的秩k近似(k < k ' < < m < < n)。它显示了很好的结果,与不同的数据集,图像分割问题和电信客户流失预测的数据集,即使在非常低的抽样(1000万×1000万矩阵,在较短的时间内只采样了100列,验证了算法的有效性。
{"title":"Scalable clustering and applications","authors":"Shahid K I, S. Chaudhury","doi":"10.1145/3009977.3010073","DOIUrl":"https://doi.org/10.1145/3009977.3010073","url":null,"abstract":"Large scale machine learning is becoming an active research area recently. Most of the existing clustering algorithms cannot handle big data due to its high time and space complexity. Among the clustering algorithms, eigen vector based clustering, such as Spectral clustering, shows very good accuracy, but it has cubic time complexity. There are various methods proposed to reduce the time and space complexity for eigen decomposition such as Nyström method, Lanc-zos method etc. Nyström method has linear time complexity in terms of number of data points, but has cubic time complexity in terms of number of sampling points. To reduce this, various Rank k approximation methods also proposed, but which are less efficient compare to the normalized spectral clustering. In this paper we propose a two step algorithm for spectral clustering to reduce the time complexity toO(nmk + m2k'), by combining both Nyström and Lanczos method, where k is the number of clusters and k' is the rank k approximation of the sampling matrix (k < k' << m << n). It shows very good results, with various data sets, image segmentation problems and churn prediction of a telecommunication data set, even with very low sampling (for 10 Million × 10 Million matrix, sampled only 100 columns) with lesser time, which confirms the validity of the algorithm.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"92 1","pages":"34:1-34:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90782060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Apoorva Sikka, Gaurav Mittal, Deepti R. Bathula, N. C. Krishnan
Recent past has seen an inexorable shift towards the use of deep learning techniques to solve a myriad of problems in the field of medical imaging. In this paper, a novel segmentation method involving a fully-connected deep neural network called Deep Segmentation Network (DSN) is proposed to perform supervised regression for brain extraction from T1-weighted magnetic resonance (MR) images. In contrast to the existing patch-based feature learning techniques, DSN works on full 3D volumes, simplifying pre- and post-processing operations, to efficiently provide a voxel-wise binary mask delineating the brain region. The model is evaluated using three publicly available datasets and is observed to either outdo or perform comparably to the state-of-the-art methods. DSN is able to achieve a maximum and minimum Dice Similarity Coefficient (DSC) of 97.57 and 92.82 respectively across all the datasets. Experiments conducted in this paper highlight the ability of the DSN model to automatically learn feature representations; making it a simple yet highly effective approach for brain segmentation. Preliminary experiments also suggest that the proposed model has the potential to segment sub-cortical structures accurately.
{"title":"Supervised deep segmentation network for brain extraction","authors":"Apoorva Sikka, Gaurav Mittal, Deepti R. Bathula, N. C. Krishnan","doi":"10.1145/3009977.3010016","DOIUrl":"https://doi.org/10.1145/3009977.3010016","url":null,"abstract":"Recent past has seen an inexorable shift towards the use of deep learning techniques to solve a myriad of problems in the field of medical imaging. In this paper, a novel segmentation method involving a fully-connected deep neural network called Deep Segmentation Network (DSN) is proposed to perform supervised regression for brain extraction from T1-weighted magnetic resonance (MR) images. In contrast to the existing patch-based feature learning techniques, DSN works on full 3D volumes, simplifying pre- and post-processing operations, to efficiently provide a voxel-wise binary mask delineating the brain region. The model is evaluated using three publicly available datasets and is observed to either outdo or perform comparably to the state-of-the-art methods. DSN is able to achieve a maximum and minimum Dice Similarity Coefficient (DSC) of 97.57 and 92.82 respectively across all the datasets. Experiments conducted in this paper highlight the ability of the DSN model to automatically learn feature representations; making it a simple yet highly effective approach for brain segmentation. Preliminary experiments also suggest that the proposed model has the potential to segment sub-cortical structures accurately.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"27 1","pages":"9:1-9:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90168050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a novel framework is proposed for automatic recognition of facial expressions. However, the face images for the proposed problem are captured at multiple view angle (i.e., multi-view facial expressions). The proposed scheme introduces a local dominant binary pattern (LDBP). Unlike uniform LBP based features, the LDBP uses fewer feature dimension without affecting the recognition performances. The LDBP is computed by improvising LBP with dominant orientations of neighborhood pixels. The eigen-value analysis of structure tensor representation of expressive face images determines the dominant directions of gray value changes in local neighbors of pixels. We use SVM for view-specific classification of multi-view facial expressions. The proposed model is experimented with the benchmark datasets of both near-frontal (CK+ and JAFEE) and multi-view (KDEF, SFEW and LFPW) face images. The datasets include faces from posed as well as spontaneous expressions. The proposed scheme outperforms state-of-the-arts by approximately 1% for the near-frontal facial expressions and by at least 3% for multi-view facial expressions on an average.
{"title":"Local dominant binary patterns for recognition of multi-view facial expressions","authors":"Bikash Santra, D. Mukherjee","doi":"10.1145/3009977.3010008","DOIUrl":"https://doi.org/10.1145/3009977.3010008","url":null,"abstract":"In this paper, a novel framework is proposed for automatic recognition of facial expressions. However, the face images for the proposed problem are captured at multiple view angle (i.e., multi-view facial expressions). The proposed scheme introduces a local dominant binary pattern (LDBP). Unlike uniform LBP based features, the LDBP uses fewer feature dimension without affecting the recognition performances. The LDBP is computed by improvising LBP with dominant orientations of neighborhood pixels. The eigen-value analysis of structure tensor representation of expressive face images determines the dominant directions of gray value changes in local neighbors of pixels. We use SVM for view-specific classification of multi-view facial expressions. The proposed model is experimented with the benchmark datasets of both near-frontal (CK+ and JAFEE) and multi-view (KDEF, SFEW and LFPW) face images. The datasets include faces from posed as well as spontaneous expressions. The proposed scheme outperforms state-of-the-arts by approximately 1% for the near-frontal facial expressions and by at least 3% for multi-view facial expressions on an average.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"47 1","pages":"25:1-25:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81421646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper feature-preserving denoising scheme for fluorescence video microscopy is presented. Fluorescence image sequences comprise of edges and fine structures with fast moving objects. Improving signal to noise ratio (SNR) while preserving structural details is a difficult task for these image sequences. Few existing denoising techniques result in over-smoothing these image sequences while others fail due to inappropriate implementation of motion estimation and compensation steps. In this paper we use nonlocal means (NLM) video denoising algorithm as to avoid motion estimation and compensation steps. The proposed shot boundary detection technique pre-processes the sequence systematically and accurately to form different shots with content-wise similar frames. To preserve the edges and fine structural details in the image sequences we modify the weighing term of NLM filter. Further, to accelerate the denoising process, separable non-local means filter is implemented for video sequences. We compare the results with existing fluorescence video de-noising techniques and show that the proposed method not only preserves the edges and small structural details more efficiently, also reduces the computational time. Efficacy of the proposed algorithm is evaluated quantitatively and qualitatively with PSNR and vision perception.
{"title":"Feature-preserving 3D fluorescence image sequence denoising","authors":"H. Bhujle","doi":"10.1145/3009977.3009983","DOIUrl":"https://doi.org/10.1145/3009977.3009983","url":null,"abstract":"In this paper feature-preserving denoising scheme for fluorescence video microscopy is presented. Fluorescence image sequences comprise of edges and fine structures with fast moving objects. Improving signal to noise ratio (SNR) while preserving structural details is a difficult task for these image sequences. Few existing denoising techniques result in over-smoothing these image sequences while others fail due to inappropriate implementation of motion estimation and compensation steps. In this paper we use nonlocal means (NLM) video denoising algorithm as to avoid motion estimation and compensation steps. The proposed shot boundary detection technique pre-processes the sequence systematically and accurately to form different shots with content-wise similar frames. To preserve the edges and fine structural details in the image sequences we modify the weighing term of NLM filter. Further, to accelerate the denoising process, separable non-local means filter is implemented for video sequences. We compare the results with existing fluorescence video de-noising techniques and show that the proposed method not only preserves the edges and small structural details more efficiently, also reduces the computational time. Efficacy of the proposed algorithm is evaluated quantitatively and qualitatively with PSNR and vision perception.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"23 1","pages":"45:1-45:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82808377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Node-link diagrams provide an intuitive way to explore networks and have inspired a large number of automated graph layout strategies that optimize aesthetic criteria. However, any particular drawing approach cannot fully satisfy all these criteria simultaneously. So the evaluation methods are designed to explore the advantages and disadvantages of different graph layout methods from these standards. Starting from the point of visual perception, this paper analyzes the node's visual importance based on a user experiment and designs a model to measure the node's visual importance. Then evaluate the pros and cons of graph layout methods by comparing the topological importance and visual importance of nodes. A heatmap-based visualization is used to provide visual feedback for the difference between the topological importance and visual importance of nodes. Meantime, a metric is built to quantify the difference precisely. Finally, experiments are done under different scale of data sets to further analyze the characteristics of these graph layout methods.
{"title":"Evaluation of graph layout methods based on visual perception","authors":"Jiafan Li, Yuhua Liu, Changbo Wang","doi":"10.1145/3009977.3010070","DOIUrl":"https://doi.org/10.1145/3009977.3010070","url":null,"abstract":"Node-link diagrams provide an intuitive way to explore networks and have inspired a large number of automated graph layout strategies that optimize aesthetic criteria. However, any particular drawing approach cannot fully satisfy all these criteria simultaneously. So the evaluation methods are designed to explore the advantages and disadvantages of different graph layout methods from these standards. Starting from the point of visual perception, this paper analyzes the node's visual importance based on a user experiment and designs a model to measure the node's visual importance. Then evaluate the pros and cons of graph layout methods by comparing the topological importance and visual importance of nodes. A heatmap-based visualization is used to provide visual feedback for the difference between the topological importance and visual importance of nodes. Meantime, a metric is built to quantify the difference precisely. Finally, experiments are done under different scale of data sets to further analyze the characteristics of these graph layout methods.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"6 1","pages":"90:1-90:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87620476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unsupervised domain adaptation (DA) techniques inherently assume the presence of ample amount of source domain training samples in addition to the target domain test data. The domains are characterized by domain-specific probability distributions governing the data which are substantially different from each other. The goal is to build a task oriented classifier model that performs proportionately in both the domains. In contrary to the standard unsupervised DA setup, we propose a maximum-margin clustering (MMC) based framework for the same which does not consider source domain labeled samples. Instead we formulate it as a joint clustering problem of all the samples from both the domains in a common feature subspace. The Geodesic Flow Kernel (GFK) based subspace projection technique in the Grassmannian manifold is adopted to cast the samples in a domain invariant space. Further, the MMC stage is followed to simultaneously group the data based on the maximization of margins and a classifier is learned for each group. The data overlapping problem is taken care of by specifically learning a SVM-KNN classifier for the potentially unreliable samples per group. We validate the framework on a pair of remote sensing images of different modalities for the purpose of land-cover classification and a generic object dataset for recognition. We observe that the proposed method exhibits performances at par with the fully supervised case for both the tasks but without the requirement of costly annotations.
{"title":"Unsupervised domain adaptation without source domain training samples: a maximum margin clustering based approach","authors":"Sudipan Saha, Biplab Banerjee, S. Merchant","doi":"10.1145/3009977.3010033","DOIUrl":"https://doi.org/10.1145/3009977.3010033","url":null,"abstract":"Unsupervised domain adaptation (DA) techniques inherently assume the presence of ample amount of source domain training samples in addition to the target domain test data. The domains are characterized by domain-specific probability distributions governing the data which are substantially different from each other. The goal is to build a task oriented classifier model that performs proportionately in both the domains. In contrary to the standard unsupervised DA setup, we propose a maximum-margin clustering (MMC) based framework for the same which does not consider source domain labeled samples. Instead we formulate it as a joint clustering problem of all the samples from both the domains in a common feature subspace. The Geodesic Flow Kernel (GFK) based subspace projection technique in the Grassmannian manifold is adopted to cast the samples in a domain invariant space. Further, the MMC stage is followed to simultaneously group the data based on the maximization of margins and a classifier is learned for each group. The data overlapping problem is taken care of by specifically learning a SVM-KNN classifier for the potentially unreliable samples per group. We validate the framework on a pair of remote sensing images of different modalities for the purpose of land-cover classification and a generic object dataset for recognition. We observe that the proposed method exhibits performances at par with the fully supervised case for both the tasks but without the requirement of costly annotations.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"46 1","pages":"56:1-56:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84889285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual garments like shirts and trousers are created from 2D patterns stitched over 3D models. However, Indian garments, like dhotis and saris, pose a unique draping challenge for physically-simulated garment systems, as they are not stitched garments. We present a method to intuitively specify the parameters governing the drape of an Indian garment using a sketch-based interface. We then interpret the sketch strokes to procedural, physically-simulated draping routines to wrap, pin and tuck the garments around the body mesh as needed. After draping, the garments are ready to be simulated and used during animation as required. We present several examples of our draping technique.
{"title":"Sketch-based simulated draping for Indian garments","authors":"Sanjeev Muralikrishnan, P. Chaudhuri","doi":"10.1145/3009977.3010001","DOIUrl":"https://doi.org/10.1145/3009977.3010001","url":null,"abstract":"Virtual garments like shirts and trousers are created from 2D patterns stitched over 3D models. However, Indian garments, like dhotis and saris, pose a unique draping challenge for physically-simulated garment systems, as they are not stitched garments. We present a method to intuitively specify the parameters governing the drape of an Indian garment using a sketch-based interface. We then interpret the sketch strokes to procedural, physically-simulated draping routines to wrap, pin and tuck the garments around the body mesh as needed. After draping, the garments are ready to be simulated and used during animation as required. We present several examples of our draping technique.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"37 1","pages":"92:1-92:6"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85320062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haribabu Kandi, Deepak Mishra, G. R. S. Subrahmanyam
Deep Learning (DL) methods extract complex set of features using architectures containing hierarchical set of layers. The features so learned have high discriminative power and thus represents the input to the network in the most efficient manner. Convolutional Neural Networks (CNN) are one of the deep learning architectures, extracts structural features with little invariance to smaller translational, scaling and other forms of distortions. In this paper, the learning capabilities of CNN's are explored towards providing improvement in rotational invariance to its architecture. We propose a new CNN architecture with an additional layer formed by differential excitation against distance for the improvement of rotational invariance and is called as RICNN. Moreover, we show that the proposed method is giving superior performance towards invariance to rotations against the original CNN architecture (training samples with different orientations are not considered) without disturbing the invariance to smaller translational, scaling and other forms of distortions. Different profiles like training time, testing time and accuracies are evaluated at different percentages of training data for comparing the performance of the proposed configuration with original configuration.
{"title":"A differential excitation based rotational invariance for convolutional neural networks","authors":"Haribabu Kandi, Deepak Mishra, G. R. S. Subrahmanyam","doi":"10.1145/3009977.3009978","DOIUrl":"https://doi.org/10.1145/3009977.3009978","url":null,"abstract":"Deep Learning (DL) methods extract complex set of features using architectures containing hierarchical set of layers. The features so learned have high discriminative power and thus represents the input to the network in the most efficient manner. Convolutional Neural Networks (CNN) are one of the deep learning architectures, extracts structural features with little invariance to smaller translational, scaling and other forms of distortions. In this paper, the learning capabilities of CNN's are explored towards providing improvement in rotational invariance to its architecture. We propose a new CNN architecture with an additional layer formed by differential excitation against distance for the improvement of rotational invariance and is called as RICNN. Moreover, we show that the proposed method is giving superior performance towards invariance to rotations against the original CNN architecture (training samples with different orientations are not considered) without disturbing the invariance to smaller translational, scaling and other forms of distortions. Different profiles like training time, testing time and accuracies are evaluated at different percentages of training data for comparing the performance of the proposed configuration with original configuration.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"9 1","pages":"70:1-70:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85513191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel technique for event geo-localization (i.e. 2-D location of the event on the surface of the earth) from the sensor metadata of crowd-sourced videos collected from smartphone devices. With the help of sensors available in the smartphone devices, such as digital compass and GPS receiver, we collect metadata information such as camera viewing direction and location along with the video. The event localization is then posed as a constrained optimization problem using available sensor metadata. Our results on the collected experimental data shows correct localization of events, which is particularly challenging for classical vision based methods because of the nature of the visual data. Since we only use sensor metadata in our approach, computational overhead is much less compared to what would be if video information is used. At the end, we illustrate the benefits of our work in analyzing the video data from multiple sources through geo-localization.
{"title":"Event geo-localization and tracking from crowd-sourced video metadata","authors":"Amit More, S. Chaudhuri","doi":"10.1145/3009977.3009993","DOIUrl":"https://doi.org/10.1145/3009977.3009993","url":null,"abstract":"We propose a novel technique for event geo-localization (i.e. 2-D location of the event on the surface of the earth) from the sensor metadata of crowd-sourced videos collected from smartphone devices. With the help of sensors available in the smartphone devices, such as digital compass and GPS receiver, we collect metadata information such as camera viewing direction and location along with the video. The event localization is then posed as a constrained optimization problem using available sensor metadata. Our results on the collected experimental data shows correct localization of events, which is particularly challenging for classical vision based methods because of the nature of the visual data. Since we only use sensor metadata in our approach, computational overhead is much less compared to what would be if video information is used. At the end, we illustrate the benefits of our work in analyzing the video data from multiple sources through geo-localization.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"138 1","pages":"24:1-24:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79788019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of tracking ball in a soccer video is challenging because of sudden change in speed and orientation of the soccer ball. Successful tracking in such a scenario depends on the ability of the algorithm to balance prior constraints continuously against the evidence garnered from the sequences of images. This paper proposes a particle filter based algorithm that tracks the ball when it changes its direction suddenly or takes high speed. Exact, deterministic tracking algorithms based on discretized functional, suffer from severe limitations in the form of prior constraints. Our tracking algorithm has shown excellent result even for partial occlusion which is a major concern in soccer video. We have shown that the proposed tracking algorithm is at least 7.2% better compared to competing approaches for soccer ball tracking.
{"title":"On the (soccer) ball","authors":"Samriddha Sanyal, A. Kundu, D. Mukherjee","doi":"10.1145/3009977.3010022","DOIUrl":"https://doi.org/10.1145/3009977.3010022","url":null,"abstract":"The problem of tracking ball in a soccer video is challenging because of sudden change in speed and orientation of the soccer ball. Successful tracking in such a scenario depends on the ability of the algorithm to balance prior constraints continuously against the evidence garnered from the sequences of images. This paper proposes a particle filter based algorithm that tracks the ball when it changes its direction suddenly or takes high speed. Exact, deterministic tracking algorithms based on discretized functional, suffer from severe limitations in the form of prior constraints. Our tracking algorithm has shown excellent result even for partial occlusion which is a major concern in soccer video. We have shown that the proposed tracking algorithm is at least 7.2% better compared to competing approaches for soccer ball tracking.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"68 1","pages":"53:1-53:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74936824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}