Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing最新文献_第6页

Pap smear image classification using convolutional neural network 基于卷积神经网络的子宫颈抹片图像分类

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010068

K. Bora, M. Chowdhury, L. Mahanta, M. Kundu, A. Das

This article presents the result of a comprehensive study on deep learning based Computer Aided Diagnostic techniques for classification of cervical dysplasia using Pap smear images. All the experiments are performed on a real indigenous image database containing 1611 images, generated at two diagnostic centres. Focus is given on constructing an effective feature vector which can perform multiple level of representation of the features hidden in a Pap smear image. For this purpose Deep Convolutional Neural Network is used, followed by feature selection using an unsupervised technique with Maximal Information Compression Index as similarity measure. Finally performance of two classifiers namely Least Square Support Vector Machine (LSSVM) and Softmax Regression are monitored and classifier selection is performed based on five measures along with five fold cross validation technique. Output classes reflects the established Bethesda system of classification for identifying pre-cancerous and cancerous lesion of cervix. The proposed system is also compared with two existing conventional systems and also tested on a publicly available database. Experimental results and comparison shows that proposed system performs efficiently in Pap smear classification.

本文介绍了一项基于深度学习的计算机辅助诊断技术的综合研究结果，该技术用于使用巴氏涂片图像对宫颈发育不良进行分类。所有实验都是在一个真正的本地图像数据库上进行的，该数据库包含1611张图像，由两个诊断中心生成。重点是构造一个有效的特征向量，它可以对隐藏在巴氏涂片图像中的特征进行多级表示。为此，使用深度卷积神经网络，然后使用无监督技术以最大信息压缩指数作为相似性度量进行特征选择。最后，对最小二乘支持向量机(LSSVM)和Softmax回归两种分类器的性能进行了监测，并基于五种度量以及五重交叉验证技术进行了分类器选择。输出分类反映了为识别宫颈癌前病变和癌性病变而建立的Bethesda分类体系。该系统还与两种现有的传统系统进行了比较，并在一个公开可用的数据库上进行了测试。实验结果和对比表明，该系统能够有效地进行子宫颈抹片分类。

{"title":"Pap smear image classification using convolutional neural network","authors":"K. Bora, M. Chowdhury, L. Mahanta, M. Kundu, A. Das","doi":"10.1145/3009977.3010068","DOIUrl":"https://doi.org/10.1145/3009977.3010068","url":null,"abstract":"This article presents the result of a comprehensive study on deep learning based Computer Aided Diagnostic techniques for classification of cervical dysplasia using Pap smear images. All the experiments are performed on a real indigenous image database containing 1611 images, generated at two diagnostic centres. Focus is given on constructing an effective feature vector which can perform multiple level of representation of the features hidden in a Pap smear image. For this purpose Deep Convolutional Neural Network is used, followed by feature selection using an unsupervised technique with Maximal Information Compression Index as similarity measure. Finally performance of two classifiers namely Least Square Support Vector Machine (LSSVM) and Softmax Regression are monitored and classifier selection is performed based on five measures along with five fold cross validation technique. Output classes reflects the established Bethesda system of classification for identifying pre-cancerous and cancerous lesion of cervix. The proposed system is also compared with two existing conventional systems and also tested on a publicly available database. Experimental results and comparison shows that proposed system performs efficiently in Pap smear classification.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"15 1","pages":"55:1-55:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78090197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Improving person re-identification systems: a novel score fusion framework for rank-n recognition 改进人员再识别系统:一种新的等级n识别分数融合框架

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010018

Arko Barman, S. Shah

Person re-identification is an essential technique for video surveillance applications. Most existing algorithms for person re-identification deal with feature extraction, metric learning or a combination of both. Combining successful state-of-the-art methods using score fusion from the perspective of person re-identification has not yet been widely explored. In this paper, we endeavor to boost the performance of existing systems by combining them using a novel score fusion framework which requires no training or dataset-dependent tuning of parameters. We develop a robust and efficient method called Unsupervised Posterior Probability-based Score Fusion (UPPSF) for combination of raw scores obtained from multiple existing person re-identification algorithms in order to achieve superior recognition rates. We propose two novel generalized linear models for estimating the posterior probabilities of a given probe image matching each of the gallery images. Normalization and combination of these posterior probability values computed from each of the existing algorithms individually, yields a set of unified scores, which is then used for ranking the gallery images. Our score fusion framework is inherently capable of dealing with different ranges and distributions of matching scores emanating from existing algorithms, without requiring any prior knowledge about the algorithms themselves, effectively treating them as "black-box" methods. Experiments on widely-used challenging datasets like VIPeR, CUHK01, CUHK03, ETHZ1 and ETHZ2 demonstrate the efficiency of UPPSF in combining multiple algorithms at the score level.

人员再识别是视频监控应用中的一项重要技术。大多数现有的人再识别算法处理特征提取、度量学习或两者的结合。从人再识别的角度出发，结合成功的最先进的得分融合方法尚未得到广泛的探索。在本文中，我们试图通过使用一种新的分数融合框架来组合现有系统，从而提高现有系统的性能，该框架不需要训练或依赖于数据集的参数调优。我们开发了一种鲁棒和高效的方法，称为无监督后验概率分数融合(UPPSF)，用于组合从多个现有的人再识别算法中获得的原始分数，以获得更高的识别率。我们提出了两种新的广义线性模型，用于估计给定探针图像与每个图库图像匹配的后验概率。这些后验概率值的归一化和组合分别从每个现有算法中计算，产生一组统一的分数，然后用于对图库图像进行排名。我们的分数融合框架本质上能够处理来自现有算法的匹配分数的不同范围和分布，而不需要任何关于算法本身的先验知识，有效地将它们视为“黑箱”方法。在VIPeR、CUHK01、CUHK03、ETHZ1和ETHZ2等广泛使用的具有挑战性的数据集上的实验证明了UPPSF在分数水平上结合多种算法的效率。

{"title":"Improving person re-identification systems: a novel score fusion framework for rank-n recognition","authors":"Arko Barman, S. Shah","doi":"10.1145/3009977.3010018","DOIUrl":"https://doi.org/10.1145/3009977.3010018","url":null,"abstract":"Person re-identification is an essential technique for video surveillance applications. Most existing algorithms for person re-identification deal with feature extraction, metric learning or a combination of both. Combining successful state-of-the-art methods using score fusion from the perspective of person re-identification has not yet been widely explored. In this paper, we endeavor to boost the performance of existing systems by combining them using a novel score fusion framework which requires no training or dataset-dependent tuning of parameters. We develop a robust and efficient method called Unsupervised Posterior Probability-based Score Fusion (UPPSF) for combination of raw scores obtained from multiple existing person re-identification algorithms in order to achieve superior recognition rates. We propose two novel generalized linear models for estimating the posterior probabilities of a given probe image matching each of the gallery images. Normalization and combination of these posterior probability values computed from each of the existing algorithms individually, yields a set of unified scores, which is then used for ranking the gallery images. Our score fusion framework is inherently capable of dealing with different ranges and distributions of matching scores emanating from existing algorithms, without requiring any prior knowledge about the algorithms themselves, effectively treating them as \"black-box\" methods. Experiments on widely-used challenging datasets like VIPeR, CUHK01, CUHK03, ETHZ1 and ETHZ2 demonstrate the efficiency of UPPSF in combining multiple algorithms at the score level.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"239 1","pages":"4:1-4:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74985051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Ghosting free HDR for dynamic scenes via shift-maps 通过移位地图为动态场景提供重影免费HDR

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010034

K. Prabhakar, R. Venkatesh Babu

Given a set of sequential exposures, High Dynamic Range imaging is a popular method for obtaining high-quality images for fairly static scenes. However, this typically suffers from ghosting artifacts for scenes with significant motion. Also, existing techniques cannot handle heavily saturated regions in the sequence. In this paper, we propose an approach that handles both the issues mentioned above. We achieve robustness to motion (both object and camera) and saturation via an energy minimization formulation with spatio-temporal constraints. The proposed approach leverages information from the neighborhood of heavily saturated regions to correct such regions. The experimental results demonstrate the superiority of our method over state-of-the-art techniques for a variety of challenging dynamic scenes.

给定一组连续曝光，高动态范围成像是获得静态场景高质量图像的流行方法。然而，对于具有显著运动的场景，这通常会受到重影的影响。此外，现有技术无法处理序列中高度饱和的区域。在本文中，我们提出了一种处理上述两个问题的方法。我们通过具有时空约束的能量最小化公式实现对运动(物体和相机)和饱和度的鲁棒性。该方法利用高饱和区域的邻域信息对高饱和区域进行校正。实验结果表明，我们的方法优于最先进的技术，适用于各种具有挑战性的动态场景。

引用次数: 2

Video stabilization by procrustes analysis of trajectories 轨迹的procrucs分析的视频稳定

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009989

Geethu Miriam Jacob, Sukhendu Das

Video Stabilization algorithms are often necessary at the pre-processing stage for many applications in video analytics. The major challenges in video stabilization are the presence of jittery motion paths of a camera, large foreground moving objects with arbitrary motion and occlusions. In this paper, a simple, yet powerful video stabilization algorithm is proposed, by eliminating the trajectories with higher dynamism appearing due to jitter. A block-wise stabilization of the camera motion is performed, by analyzing the trajectories in Kendall's shape space. A 3-stage iterative process is proposed for each block of frames. At the first stage of the iterative process, the trajectories with relatively higher dynamism (estimated using optical flow) are eliminated. At the second stage, a Procrustes alignment is performed on the remaining trajectories and Frechet mean of the aligned trajectories is estimated. Finally, the Frechet mean is stabilized and a transformation of the stabilized Frechet mean to the original space (of the trajectories) yields the stabilized trajectories. A global optimization function has been designed for stabilization, thus minimizing wobbles and distortions in the frames. As the motion paths of the higher and lower dynamic regions become more distinct after stabilization, this iterative process helps in the identification of the stabilized background trajectories (with lower dynamism), which are used to warp the frames for rendering the stabilized frames. Experiments are done with varying levels of jitter introduced on stable videos, apart from a few benchmarked natural jittery videos. In cases, where synthetic jitter is fused on stable videos, an error norm comparing the groundtruth scores (scores of the stable videos) to the scores of the stabilized videos, is used for comparative study of performance. The results show the superiority of our proposed method over other state-of-the-art methods.

在视频分析的许多应用中，在预处理阶段通常需要视频稳定算法。视频稳定的主要挑战是摄像机的抖动运动路径，具有任意运动和遮挡的大型前景运动物体的存在。本文提出了一种简单但功能强大的视频稳定算法，该算法通过消除由于抖动而出现的高动态轨迹。通过分析Kendall形状空间中的轨迹，执行相机运动的块方向稳定。对每个帧块提出了一个3阶段的迭代过程。在迭代过程的第一阶段，消除了相对较高的动力学轨迹(利用光流估计)。在第二阶段，对剩余的轨迹进行Procrustes对齐，并估计对齐轨迹的Frechet平均值。最后，稳定Frechet均值，将稳定后的Frechet均值变换到(轨迹的)原始空间，得到稳定的轨迹。为稳定设计了一个全局优化函数，从而最大限度地减少了帧中的抖动和扭曲。由于稳定后高动态区域和低动态区域的运动路径变得更加明显，这种迭代过程有助于识别稳定的背景轨迹(具有较低的动态)，这些轨迹用于扭曲帧以呈现稳定帧。实验是在稳定的视频中引入不同程度的抖动，除了一些基准的自然抖动视频。在将合成抖动融合到稳定视频的情况下，将groundtruth分数(稳定视频的分数)与稳定视频的分数进行比较的错误规范用于性能的比较研究。结果表明，我们提出的方法优于其他先进的方法。

{"title":"Video stabilization by procrustes analysis of trajectories","authors":"Geethu Miriam Jacob, Sukhendu Das","doi":"10.1145/3009977.3009989","DOIUrl":"https://doi.org/10.1145/3009977.3009989","url":null,"abstract":"Video Stabilization algorithms are often necessary at the pre-processing stage for many applications in video analytics. The major challenges in video stabilization are the presence of jittery motion paths of a camera, large foreground moving objects with arbitrary motion and occlusions. In this paper, a simple, yet powerful video stabilization algorithm is proposed, by eliminating the trajectories with higher dynamism appearing due to jitter. A block-wise stabilization of the camera motion is performed, by analyzing the trajectories in Kendall's shape space. A 3-stage iterative process is proposed for each block of frames. At the first stage of the iterative process, the trajectories with relatively higher dynamism (estimated using optical flow) are eliminated. At the second stage, a Procrustes alignment is performed on the remaining trajectories and Frechet mean of the aligned trajectories is estimated. Finally, the Frechet mean is stabilized and a transformation of the stabilized Frechet mean to the original space (of the trajectories) yields the stabilized trajectories. A global optimization function has been designed for stabilization, thus minimizing wobbles and distortions in the frames. As the motion paths of the higher and lower dynamic regions become more distinct after stabilization, this iterative process helps in the identification of the stabilized background trajectories (with lower dynamism), which are used to warp the frames for rendering the stabilized frames. Experiments are done with varying levels of jitter introduced on stable videos, apart from a few benchmarked natural jittery videos. In cases, where synthetic jitter is fused on stable videos, an error norm comparing the groundtruth scores (scores of the stable videos) to the scores of the stabilized videos, is used for comparative study of performance. The results show the superiority of our proposed method over other state-of-the-art methods.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"68 1","pages":"47:1-47:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74305912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Blind image quality assessment using subspace alignment 基于子空间对齐的盲图像质量评估

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010014

I. Kiran, T. Guha, Gaurav Pandey

This paper addresses the problem of estimating the quality of an image as it would be perceived by a human. A well accepted approach to assess perceptual quality of an image is to quantify its loss of structural information. We propose a blind image quality assessment method that aims at quantifying structural information loss in a given (possibly distorted) image by comparing its structures with those extracted from a database of clean images. We first construct a subspace from the clean natural images using (i) principal component analysis (PCA), and (ii) overcomplete dictionary learning with sparsity constraint. While PCA provides mathematical convenience, an overcomplete dictionary is known to capture the perceptually important structures resembling the simple cells in the primary visual cortex. The subspace learned from the clean images is called the source subspace. Similarly, a subspace, called the target subspace, is learned from the distorted image. In order to quantify the structural information loss, we use a subspace alignment technique which transforms the target subspace into the source by optimizing over a transformation matrix. This transformation matrix is subsequently used to measure the global and local (patch-based) quality score of the distorted image. The quality scores obtained by the proposed method are shown to correlate well with the subjective scores obtained from human annotators. Our method achieves competitive results when evaluated on three benchmark databases.

本文解决了估计图像质量的问题，因为它将被人类感知。一种被广泛接受的评估图像感知质量的方法是量化其结构信息的损失。我们提出了一种盲图像质量评估方法，旨在通过将给定(可能失真的)图像的结构与从干净图像数据库中提取的图像进行比较，来量化图像中的结构信息损失。我们首先使用(i)主成分分析(PCA)和(ii)具有稀疏性约束的过完备字典学习从干净的自然图像中构建子空间。虽然PCA提供了数学上的便利，但已知一个过于完整的字典可以捕获类似初级视觉皮层中简单细胞的感知重要结构。从干净图像中学习到的子空间称为源子空间。类似地，从扭曲的图像中学习一个子空间，称为目标子空间。为了量化结构信息的损失，我们使用了一种子空间对齐技术，该技术通过对变换矩阵进行优化，将目标子空间转换为源空间。该变换矩阵随后用于测量畸变图像的全局和局部(基于补丁的)质量分数。所提出的方法获得的质量分数与人类注释者获得的主观分数具有良好的相关性。当在三个基准数据库上进行评估时，我们的方法获得了具有竞争力的结果。

{"title":"Blind image quality assessment using subspace alignment","authors":"I. Kiran, T. Guha, Gaurav Pandey","doi":"10.1145/3009977.3010014","DOIUrl":"https://doi.org/10.1145/3009977.3010014","url":null,"abstract":"This paper addresses the problem of estimating the quality of an image as it would be perceived by a human. A well accepted approach to assess perceptual quality of an image is to quantify its loss of structural information. We propose a blind image quality assessment method that aims at quantifying structural information loss in a given (possibly distorted) image by comparing its structures with those extracted from a database of clean images. We first construct a subspace from the clean natural images using (i) principal component analysis (PCA), and (ii) overcomplete dictionary learning with sparsity constraint. While PCA provides mathematical convenience, an overcomplete dictionary is known to capture the perceptually important structures resembling the simple cells in the primary visual cortex. The subspace learned from the clean images is called the source subspace. Similarly, a subspace, called the target subspace, is learned from the distorted image. In order to quantify the structural information loss, we use a subspace alignment technique which transforms the target subspace into the source by optimizing over a transformation matrix. This transformation matrix is subsequently used to measure the global and local (patch-based) quality score of the distorted image. The quality scores obtained by the proposed method are shown to correlate well with the subjective scores obtained from human annotators. Our method achieves competitive results when evaluated on three benchmark databases.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"8 1","pages":"91:1-91:6"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86145755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Adaptive artistic stylization of images 自适应艺术风格的图像

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009985

Ameya Deshpande, S. Raman

In this work, we present a novel non-photorealistic rendering method which produces good quality stylization results for color images. The procedure is driven by saliency measure in the foreground and the background region. We start with generating saliency map and simple thresholding based segmentation to get rough estimation of the foreground-background mask. We improve this mask by using a scribble-based method where the scribbles for foreground-background regions are automatically generated from the previous rough estimation. Followed by the mask generation, we proceed with an iterative abstraction process which involves edge-preserving blurring and edge detection. The number of iterations of the abstraction process to be performed in the foreground and background regions are decided by tracking the changes in saliency measure in the foreground and the background regions. Performing unequal number of iterations helps to improve the average saliency measure in more salient region (foreground) while decreasing the average saliency measure in the non-salient region (background). Implementation results of our method shows the merits of this approach with other competing methods.

在这项工作中，我们提出了一种新的非真实感渲染方法，可以对彩色图像产生高质量的风格化结果。该过程由前景和背景区域的显著性度量驱动。我们从生成显著性图和简单的基于阈值分割开始，以获得前景-背景掩码的粗略估计。我们通过使用基于涂鸦的方法来改进这个掩码，其中前景-背景区域的涂鸦是根据先前的粗略估计自动生成的。在蒙版生成之后，我们进行了一个迭代的抽象过程，包括边缘保持模糊和边缘检测。通过跟踪前景和背景区域显著性度量的变化来确定在前景和背景区域执行抽象过程的迭代次数。执行不等次数的迭代有助于提高更显著区域(前景)的平均显著性度量，同时降低非显著区域(背景)的平均显著性度量。该方法的实现结果表明，该方法与其他竞争方法相比具有优势。

{"title":"Adaptive artistic stylization of images","authors":"Ameya Deshpande, S. Raman","doi":"10.1145/3009977.3009985","DOIUrl":"https://doi.org/10.1145/3009977.3009985","url":null,"abstract":"In this work, we present a novel non-photorealistic rendering method which produces good quality stylization results for color images. The procedure is driven by saliency measure in the foreground and the background region. We start with generating saliency map and simple thresholding based segmentation to get rough estimation of the foreground-background mask. We improve this mask by using a scribble-based method where the scribbles for foreground-background regions are automatically generated from the previous rough estimation. Followed by the mask generation, we proceed with an iterative abstraction process which involves edge-preserving blurring and edge detection. The number of iterations of the abstraction process to be performed in the foreground and background regions are decided by tracking the changes in saliency measure in the foreground and the background regions. Performing unequal number of iterations helps to improve the average saliency measure in more salient region (foreground) while decreasing the average saliency measure in the non-salient region (background). Implementation results of our method shows the merits of this approach with other competing methods.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"68 1","pages":"3:1-3:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72807842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A large scale dataset for classification of vehicles in urban traffic scenes 城市交通场景车辆分类的大规模数据集

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010040

H. S. Bharadwaj, S. Biswas, K. Ramakrishnan

Vehicle Classification has been a well-researched topic in the recent past. However, advances in the field have not been corroborated with deployment in Intelligent Traffic Management, due to non-availability of surveillance quality visual data of vehicles in urban traffic junctions. In this paper, we present a dataset aimed at exploring Vehicle Classification and related problems in dense, urban traffic scenarios. We present our on-going effort of collecting a large scale, surveillance quality, dataset of vehicles seen mostly on Indian roads. The dataset is an extensive collection of vehicles under different poses, scales and illumination conditions in addition to a smaller set of Near Infrared spectrum images for night time and low light traffic surveillance. We will make the dataset available for further research in this area. We propose and evaluate few baseline algorithms for the task of vehicle classification on this dataset. We also discuss challenges and potential applications of the data.

车辆分类是近年来研究较多的一个课题。然而，由于无法获得城市交通路口车辆的监控质量视觉数据，该领域的进展尚未得到智能交通管理部署的证实。在本文中，我们提出了一个数据集，旨在探索密集城市交通场景下的车辆分类和相关问题。我们展示了我们正在进行的收集大规模、高质量、主要在印度道路上看到的车辆数据集的努力。该数据集广泛收集了不同姿态、尺度和照明条件下的车辆，此外还有一组较小的近红外光谱图像，用于夜间和低光交通监控。我们将为这一领域的进一步研究提供数据集。在此数据集上，我们提出并评估了几种用于车辆分类任务的基线算法。我们还讨论了这些数据的挑战和潜在应用。

引用次数: 12

Depth estimation from single image using machine learning techniques 使用机器学习技术对单个图像进行深度估计

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010019

Nidhi Chahal, Meghna Pippal, S. Chaudhury

In this paper, the problem of depth estimation from single monocular image is considered. The depth cues such as motion, stereo correspondences are not present in single image which makes the task more challenging. We propose a machine learning based approach for extracting depth information from single image. The deep learning is used for extracting features, then, initial depths are generated using manifold learning in which neighborhood preserving embedding algorithm is used. Then, fixed point supervised learning is applied for sequential labeling to obtain more consistent and accurate depth maps. The features used are initial depths obtained from manifold learning and various image based features including texture, color and edges which provide useful information about depth. A fixed point contraction mapping function is generated using which depth map is predicted for new structured input image. The transfer learning approach is also used for improvement in learning in a new task through the transfer of knowledge from a related task that has already been learned. The predicted depth maps are reliable, accurate and very close to ground truth depths which is validated using objective measures: RMSE, PSNR, SSIM and subjective measure: MOS score.

本文研究了单眼图像的深度估计问题。运动、立体对应等深度线索并不存在于单个图像中，这使得任务更具挑战性。我们提出了一种基于机器学习的方法来从单个图像中提取深度信息。首先利用深度学习提取特征，然后利用流形学习生成初始深度，其中采用邻域保持嵌入算法。然后，采用不动点监督学习进行顺序标注，得到更加一致和准确的深度图。使用的特征是由流形学习获得的初始深度和各种基于图像的特征，包括纹理、颜色和边缘，这些特征提供了关于深度的有用信息。生成一个定点收缩映射函数，利用该函数预测新的结构化输入图像的深度图。迁移学习方法也用于通过从已经学习过的相关任务中迁移知识来提高在新任务中的学习。预测的深度图是可靠的，准确的，并且非常接近地面真实深度，使用客观测量:RMSE, PSNR, SSIM和主观测量:MOS分数进行验证。

{"title":"Depth estimation from single image using machine learning techniques","authors":"Nidhi Chahal, Meghna Pippal, S. Chaudhury","doi":"10.1145/3009977.3010019","DOIUrl":"https://doi.org/10.1145/3009977.3010019","url":null,"abstract":"In this paper, the problem of depth estimation from single monocular image is considered. The depth cues such as motion, stereo correspondences are not present in single image which makes the task more challenging. We propose a machine learning based approach for extracting depth information from single image. The deep learning is used for extracting features, then, initial depths are generated using manifold learning in which neighborhood preserving embedding algorithm is used. Then, fixed point supervised learning is applied for sequential labeling to obtain more consistent and accurate depth maps. The features used are initial depths obtained from manifold learning and various image based features including texture, color and edges which provide useful information about depth. A fixed point contraction mapping function is generated using which depth map is predicted for new structured input image. The transfer learning approach is also used for improvement in learning in a new task through the transfer of knowledge from a related task that has already been learned. The predicted depth maps are reliable, accurate and very close to ground truth depths which is validated using objective measures: RMSE, PSNR, SSIM and subjective measure: MOS score.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"87 1","pages":"19:1-19:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76825275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

COMA-boost: co-operative multi agent AdaBoost COMA-boost:合作多agent AdaBoost

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009997

A. Lahiri, Biswajit Paria, P. Biswas

Multi feature space representation is a common practise in computer vision applications. Traditional features such as HOG, SIFT, SURF etc., individually encapsulates certain discriminative cues for visual classification. On the other hand, each layer of a deep neural network generates multi ordered representations. In this paper we present a novel approach for such multi feature representation learning using Adaptive Boosting (AdaBoost). General practise in AdaBoost [8] is to concatenate components of feature spaces and train base learners to classify examples as correctly/incorrectly classified. We posit that multi feature space learning should be viewed as a derivative of cooperative multi agent learning. To this end, we propose a mathematical framework to leverage performance of base learners over each feature space, gauge a measure of "difficulty" of training space and finally make soft weight updates rather than strict binary weight updates prevalent in regular AdaBoost. This is made possible by periodically sharing of response states by our learner agents in the boosting framework. Theoretically, such soft weight update policy allows infinite combinations of weight updates on training space compared to only two possibilities in AdaBoost. This opens up the opportunity to identify 'more difficult' examples compared to 'less difficult' examples. We test our model on traditional multi feature representation of MNIST handwritten character dataset and 100-Leaves classification challenge. We consistently outperform traditional and variants of multi view boosting in terms of accuracy while margin analysis reveals that proposed method fosters formation of more confident ensemble of learner agents. As an application of using our model in conjecture with deep neural network, we test our model on the challenging task of retinal blood vessel segmentation from fundus images of DRIVE dataset by using kernel dictionaries from layers of unsupervised trained stacked autoencoder network. Our work opens a new avenue of research for combining a popular statistical machine learning paradigm with deep network architectures.

多特征空间表示是计算机视觉应用中常见的一种方法。HOG、SIFT、SURF等传统特征分别封装了特定的判别线索用于视觉分类。另一方面，深度神经网络的每一层都会生成多个有序表示。在本文中，我们提出了一种使用自适应增强(AdaBoost)的新方法来实现这种多特征表示学习。AdaBoost[8]的一般做法是连接特征空间的组成部分，并训练基础学习器将示例分类为正确/错误分类。我们认为多特征空间学习应该被看作是多智能体合作学习的衍生物。为此，我们提出了一个数学框架来利用基础学习器在每个特征空间上的性能，衡量训练空间的“难度”，最后进行软权重更新，而不是常规AdaBoost中普遍存在的严格的二元权重更新。这可以通过我们的学习代理在促进框架中周期性地共享响应状态来实现。理论上，这种软权值更新策略允许训练空间上的权值更新的无限组合，而在AdaBoost中只有两种可能性。这就提供了识别“更难”和“不那么难”的例子的机会。我们在MNIST手写字符数据集的传统多特征表示和100- leaf分类挑战上测试了我们的模型。在准确性方面，我们始终优于传统的多视图增强和变体，而边际分析表明，所提出的方法促进了更自信的学习代理集合的形成。作为我们的模型在深度神经网络猜想中的应用，我们使用无监督训练的堆叠自编码器网络层的核字典对我们的模型在DRIVE数据集眼底图像中视网膜血管分割的挑战性任务上进行了测试。我们的工作为将流行的统计机器学习范式与深度网络架构相结合开辟了一条新的研究途径。

{"title":"COMA-boost: co-operative multi agent AdaBoost","authors":"A. Lahiri, Biswajit Paria, P. Biswas","doi":"10.1145/3009977.3009997","DOIUrl":"https://doi.org/10.1145/3009977.3009997","url":null,"abstract":"Multi feature space representation is a common practise in computer vision applications. Traditional features such as HOG, SIFT, SURF etc., individually encapsulates certain discriminative cues for visual classification. On the other hand, each layer of a deep neural network generates multi ordered representations. In this paper we present a novel approach for such multi feature representation learning using Adaptive Boosting (AdaBoost). General practise in AdaBoost [8] is to concatenate components of feature spaces and train base learners to classify examples as correctly/incorrectly classified. We posit that multi feature space learning should be viewed as a derivative of cooperative multi agent learning. To this end, we propose a mathematical framework to leverage performance of base learners over each feature space, gauge a measure of \"difficulty\" of training space and finally make soft weight updates rather than strict binary weight updates prevalent in regular AdaBoost. This is made possible by periodically sharing of response states by our learner agents in the boosting framework. Theoretically, such soft weight update policy allows infinite combinations of weight updates on training space compared to only two possibilities in AdaBoost. This opens up the opportunity to identify 'more difficult' examples compared to 'less difficult' examples. We test our model on traditional multi feature representation of MNIST handwritten character dataset and 100-Leaves classification challenge. We consistently outperform traditional and variants of multi view boosting in terms of accuracy while margin analysis reveals that proposed method fosters formation of more confident ensemble of learner agents. As an application of using our model in conjecture with deep neural network, we test our model on the challenging task of retinal blood vessel segmentation from fundus images of DRIVE dataset by using kernel dictionaries from layers of unsupervised trained stacked autoencoder network. Our work opens a new avenue of research for combining a popular statistical machine learning paradigm with deep network architectures.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"4 1","pages":"43:1-43:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79847842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Document blur detection using edge profile mining 使用边缘轮廓挖掘的文档模糊检测

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing

Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009982

S. Maheshwari, P. Rai, Gopal Sharma, Vineet Gandhi

We present an algorithm for automatic blur detection of document images using a novel approach based on edge intensity profiles. Our main insight is that the edge profiles are a strong indicator of the blur present in the image, with steep profiles implying sharper regions and gradual profiles implying blurred regions. Our approach first retrieves the profiles for each point of intensity transition (each edge point) along the gradient and then uses them to output a quantitative measure indicating the extent of blur in the input image. The real time performance of the proposed approach makes it suitable for most applications. Additionally, our method works for both hand written and digital documents and is agnostic to the font types and sizes, which gives it a major advantage over the currently prevalent learning based approaches. Extensive quantitative and qualitative experiments over two different datasets show that our method outperforms almost all algorithms in current state of the art by a significant margin, especially in cross dataset experiments.

我们提出了一种基于边缘强度轮廓的文档图像自动模糊检测算法。我们的主要观点是，边缘轮廓是图像中存在模糊的一个强有力的指标，陡峭的轮廓意味着更清晰的区域，渐变的轮廓意味着模糊的区域。我们的方法首先检索沿梯度的每个强度过渡点(每个边缘点)的轮廓，然后使用它们输出指示输入图像中模糊程度的定量测量。该方法的实时性使其适用于大多数应用。此外，我们的方法适用于手写和数字文档，并且与字体类型和大小无关，这使得它比当前流行的基于学习的方法具有主要优势。在两个不同的数据集上进行的大量定量和定性实验表明，我们的方法在很大程度上优于当前最先进的几乎所有算法，特别是在跨数据集实验中。

引用次数: 6