首页 > 最新文献

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing最新文献

英文 中文
Automated brain tractography segmentation using curvature points 使用曲率点的自动脑束图分割
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010013
Vedang Patel, Anand Parmar, A. Bhavsar, A. Nigam
Classification of brain fiber tracts is an important problem in brain tractography analysis. We propose a supervised algorithm which learns features for anatomically meaningful fiber clusters, from labeled DTI white matter data. The classification is performed at two levels: a) Grey vs White matter (macro level) and b) White matter clusters (micro level). Our approach focuses on high curvature points in the fiber tracts, which embodies the unique characteristics of the respective classes. Any test fiber is classified into one of these learned classes by comparing proximity using the learned curvature-point model (for micro level) and with a neural network classifier (at macro level). The proposed algorithm has been validated with brain DTI data for three subjects containing about 2,50,000 fibers per subject, and is shown to yield high classification accuracy (> 93%) at both macro and micro levels.
脑纤维束的分类是脑纤维束分析中的一个重要问题。我们提出了一种监督算法,该算法从标记的DTI白质数据中学习解剖学上有意义的纤维簇的特征。分类分为两个层面:a)灰质vs白质(宏观层面)和b)白质簇(微观层面)。我们的方法侧重于纤维束中的高曲率点,这体现了各自类别的独特特征。通过使用学习的曲率点模型(微观层面)和神经网络分类器(宏观层面)比较接近度,将任何测试纤维分类到这些学习类之一。该算法已在三个受试者的大脑DTI数据中进行了验证,每个受试者约含有25万个纤维,并且在宏观和微观层面上都显示出较高的分类准确率(> 93%)。
{"title":"Automated brain tractography segmentation using curvature points","authors":"Vedang Patel, Anand Parmar, A. Bhavsar, A. Nigam","doi":"10.1145/3009977.3010013","DOIUrl":"https://doi.org/10.1145/3009977.3010013","url":null,"abstract":"Classification of brain fiber tracts is an important problem in brain tractography analysis. We propose a supervised algorithm which learns features for anatomically meaningful fiber clusters, from labeled DTI white matter data. The classification is performed at two levels: a) Grey vs White matter (macro level) and b) White matter clusters (micro level). Our approach focuses on high curvature points in the fiber tracts, which embodies the unique characteristics of the respective classes. Any test fiber is classified into one of these learned classes by comparing proximity using the learned curvature-point model (for micro level) and with a neural network classifier (at macro level). The proposed algorithm has been validated with brain DTI data for three subjects containing about 2,50,000 fibers per subject, and is shown to yield high classification accuracy (> 93%) at both macro and micro levels.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"25 1","pages":"18:1-18:6"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76641517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Hierarchical structured learning for indoor autonomous navigation of Quadcopter 四轴飞行器室内自主导航的分层结构学习
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009990
Vishakh Duggal, K. Bipin, Utsav Shah, K. Krishna
Autonomous navigation of generic monocular quadcopter in the indoor environment requires sophisticated approaches for perception, planning and control. This paper presents a system which enables a miniature quadcopter with a frontal monocular camera to autonomously navigate and explore the unknown indoor environment. Initially, the system estimates dense depth map of the environment from a single video frame using our proposed novel supervised Hierarchical Structured Learning (hsl) technique, which yields both high accuracy levels and better generalization. The proposed hsl approach discretizes the overall depth range into multiple sets. It structures these sets hierarchically and recursively through partitioning the set of classes into two subsets with subsets representing apportioned depth range of the parent set, forming a binary tree. The binary classification method is applied to each internal node of binary tree separately using Support Vector Machine (svm). Whereas, the depth estimation of each pixel of the image starts from the root node in top-down approach, classifying repetitively till it reaches any of the leaf node representing its estimated depth. The generated depth map is provided as an input to Convolutional Neural Network (cnn), which generates flight planning commands. Finally, trajectory planning and control module employs a convex programming technique to generate collision-free minimum time trajectory which follows these flight planning commands and produces appropriate control inputs for the quadcopter. The results convey unequivocally the advantages of depth perception by hsl, while repeatable flights of successful nature in typical indoor corridors confirm the efficacy of the pipeline.
普通单目四轴飞行器在室内环境中的自主导航需要复杂的感知、规划和控制方法。本文提出了一种微型四轴飞行器前置单目摄像头自主导航探索未知室内环境的系统。最初,系统使用我们提出的新颖的监督分层结构学习(hsl)技术从单个视频帧估计环境的密集深度图,该技术产生了高精度水平和更好的泛化。提出的hsl方法将整个深度范围离散为多个集。它通过将类集划分为两个子集,其中子集表示父集的分配深度范围,从而分层递归地构建这些集,形成二叉树。利用支持向量机(svm)对二叉树的每个内部节点分别进行二叉分类。而自顶向下方法对图像的每个像素的深度估计从根节点开始,重复分类,直到到达代表其估计深度的任何叶节点。生成的深度图作为卷积神经网络(cnn)的输入提供,卷积神经网络生成飞行计划命令。最后,轨迹规划和控制模块采用凸规划技术生成无碰撞最小时间轨迹,该轨迹遵循这些飞行规划命令并为四轴飞行器产生适当的控制输入。结果明确地传达了hsl的深度感知优势,而在典型室内走廊中成功的重复飞行证实了管道的有效性。
{"title":"Hierarchical structured learning for indoor autonomous navigation of Quadcopter","authors":"Vishakh Duggal, K. Bipin, Utsav Shah, K. Krishna","doi":"10.1145/3009977.3009990","DOIUrl":"https://doi.org/10.1145/3009977.3009990","url":null,"abstract":"Autonomous navigation of generic monocular quadcopter in the indoor environment requires sophisticated approaches for perception, planning and control. This paper presents a system which enables a miniature quadcopter with a frontal monocular camera to autonomously navigate and explore the unknown indoor environment. Initially, the system estimates dense depth map of the environment from a single video frame using our proposed novel supervised Hierarchical Structured Learning (hsl) technique, which yields both high accuracy levels and better generalization. The proposed hsl approach discretizes the overall depth range into multiple sets. It structures these sets hierarchically and recursively through partitioning the set of classes into two subsets with subsets representing apportioned depth range of the parent set, forming a binary tree. The binary classification method is applied to each internal node of binary tree separately using Support Vector Machine (svm). Whereas, the depth estimation of each pixel of the image starts from the root node in top-down approach, classifying repetitively till it reaches any of the leaf node representing its estimated depth. The generated depth map is provided as an input to Convolutional Neural Network (cnn), which generates flight planning commands. Finally, trajectory planning and control module employs a convex programming technique to generate collision-free minimum time trajectory which follows these flight planning commands and produces appropriate control inputs for the quadcopter. The results convey unequivocally the advantages of depth perception by hsl, while repeatable flights of successful nature in typical indoor corridors confirm the efficacy of the pipeline.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"1 1","pages":"13:1-13:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76462407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep image inpainting with region prediction at hierarchical scales 基于层次尺度区域预测的深度图像绘制
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009992
Souradeep Chakraborty, Jogendra Nath Kundu, R. Venkatesh Babu
In this paper, we propose a CNN based method for image inpainting, which utilizes the inpaintings generated at different hierarchical resolutions. Firstly, we begin with the prediction of the missing image region with larger contextual information at the lowest resolution using deconv layers. Next, we refine the predicted region at greater hierarchical scales by imposing gradually reduced contextual information surrounding the predicted region by training different CNNs. Thus, our method not only utilizes information from different hierarchical resolutions but also intelligently leverages the context information at different hierarchy to produce better inpainted image. The individual models are trained jointly, using loss functions placed at intermediate layers. Finally, the CNN generated image region is sharpened using the unsharp masking operation, followed by intensity matching with the contextual region, to produce visually consistent and appealing inpaintings with more prominent edges. Comparison of our method with well-known inpainting methods, on the Caltech 101 objects dataset, demonstrates the quantitative and qualitative strengths of our method over the others.
在本文中,我们提出了一种基于CNN的图像绘制方法,该方法利用在不同层次分辨率下生成的图像。首先,我们使用反卷积层在最低分辨率下预测具有较大上下文信息的缺失图像区域。接下来,我们通过训练不同的cnn在预测区域周围施加逐渐减少的上下文信息,从而在更大的层次尺度上改进预测区域。因此,我们的方法不仅利用了不同层次分辨率的信息,而且还智能地利用了不同层次的上下文信息来生成更好的图像。使用中间层的损失函数,对单个模型进行联合训练。最后,对CNN生成的图像区域进行非锐化掩蔽操作,然后与上下文区域进行强度匹配,生成边缘更加突出的视觉一致性和吸引人的图像。在加州理工学院101对象数据集上,将我们的方法与众所周知的喷漆方法进行比较,证明了我们的方法在定量和定性方面的优势。
{"title":"Deep image inpainting with region prediction at hierarchical scales","authors":"Souradeep Chakraborty, Jogendra Nath Kundu, R. Venkatesh Babu","doi":"10.1145/3009977.3009992","DOIUrl":"https://doi.org/10.1145/3009977.3009992","url":null,"abstract":"In this paper, we propose a CNN based method for image inpainting, which utilizes the inpaintings generated at different hierarchical resolutions. Firstly, we begin with the prediction of the missing image region with larger contextual information at the lowest resolution using deconv layers. Next, we refine the predicted region at greater hierarchical scales by imposing gradually reduced contextual information surrounding the predicted region by training different CNNs. Thus, our method not only utilizes information from different hierarchical resolutions but also intelligently leverages the context information at different hierarchy to produce better inpainted image. The individual models are trained jointly, using loss functions placed at intermediate layers. Finally, the CNN generated image region is sharpened using the unsharp masking operation, followed by intensity matching with the contextual region, to produce visually consistent and appealing inpaintings with more prominent edges. Comparison of our method with well-known inpainting methods, on the Caltech 101 objects dataset, demonstrates the quantitative and qualitative strengths of our method over the others.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"60 1","pages":"33:1-33:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81366487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Limitations with measuring performance of techniques for abnormality localization in surveillance video and how to overcome them? 监控视频异常定位技术测量性能的局限性及如何克服?
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010044
M. Sharma, S. Sarcar, D. Sheet, P. Biswas
Now a days video surveillance is becoming more popular due to global security concerns and with the increasing need for effective monitoring of public places. The key goal of video surveillance is to detect suspicious or abnormal behavior. Various efforts have been made to detect an abnormality in the video. Further to these advancements, there is a need for better techniques for evaluation of abnormality localization in video surveillance. Existing technique mainly uses forty percent overlap rule with ground-truth data, and does not considers the extra predicted region into the computation. Existing metrics have been found to be inaccurate when more than one region is present within the frame which may or may not be correctly localized or marked as abnormal. This work attempts to bridge these limitations in existing metrics. In this paper, we investigate three existing metrics and discuss their benefits and limitations for evaluating localization of abnormality in video. We further extend the existing work by introducing penalty functions and substantiate the validity of proposed metrics with a sufficient number of instances. The presented metric are validated on data (35 different situations) for which the overlap has been computed analytically.
如今,由于全球安全问题以及对公共场所有效监控的需求日益增加,视频监控越来越受欢迎。视频监控的关键目标是检测可疑或异常行为。为了发现视频中的异常情况,已经做出了各种努力。除了这些进步之外,还需要更好的技术来评估视频监控中的异常定位。现有的方法主要采用与真值40%重叠的规则,没有将额外的预测区域考虑到计算中。现有的指标已被发现是不准确的,当超过一个区域出现在框架内,可能或可能不被正确定位或标记为异常。这项工作试图弥补现有度量标准中的这些限制。在本文中,我们研究了三个现有的度量标准,并讨论了它们在评估视频异常定位方面的优点和局限性。我们通过引入惩罚函数进一步扩展了现有的工作,并用足够数量的实例证实了所提出指标的有效性。给出的度量在数据(35种不同的情况)上进行验证,其中重叠已被分析计算。
{"title":"Limitations with measuring performance of techniques for abnormality localization in surveillance video and how to overcome them?","authors":"M. Sharma, S. Sarcar, D. Sheet, P. Biswas","doi":"10.1145/3009977.3010044","DOIUrl":"https://doi.org/10.1145/3009977.3010044","url":null,"abstract":"Now a days video surveillance is becoming more popular due to global security concerns and with the increasing need for effective monitoring of public places. The key goal of video surveillance is to detect suspicious or abnormal behavior. Various efforts have been made to detect an abnormality in the video. Further to these advancements, there is a need for better techniques for evaluation of abnormality localization in video surveillance. Existing technique mainly uses forty percent overlap rule with ground-truth data, and does not considers the extra predicted region into the computation. Existing metrics have been found to be inaccurate when more than one region is present within the frame which may or may not be correctly localized or marked as abnormal. This work attempts to bridge these limitations in existing metrics. In this paper, we investigate three existing metrics and discuss their benefits and limitations for evaluating localization of abnormality in video. We further extend the existing work by introducing penalty functions and substantiate the validity of proposed metrics with a sufficient number of instances. The presented metric are validated on data (35 different situations) for which the overlap has been computed analytically.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"3 1","pages":"75:1-75:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82196394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reinforced random forest 强化随机森林
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010003
Angshuman Paul, D. Mukherjee
Reinforcement learning improves classification accuracy. But use of reinforcement learning is relatively unexplored in case of random forest classifier. We propose a reinforced random forest (RRF) classifier that exploits reinforcement learning to improve classification accuracy. Our algorithm is initialized with a forest. Then the entire training data is tested using the initial forest. In order to reinforce learning, we use mis-classified data points to grow certain number of new trees. A subset of the new trees is added to the existing forest using a novel graph-based approach. We show that addition of these trees ensures improvement in classification accuracy. This process is continued iteratively until classification accuracy saturates. The proposed RRF has low computational burden. We achieve at least 3% improvement in F-measure compared to random forest in three breast cancer datasets. Results on benchmark datasets show significant reduction in average classification error.
强化学习提高了分类的准确性。但是在随机森林分类器的情况下,强化学习的使用是相对未被探索的。我们提出了一种利用强化学习来提高分类精度的增强随机森林(RRF)分类器。我们的算法是用森林初始化的。然后使用初始森林对整个训练数据进行测试。为了加强学习,我们使用错误分类的数据点来生长一定数量的新树。使用一种新颖的基于图的方法将新树的子集添加到现有的森林中。我们证明了这些树的添加确保了分类精度的提高。这个过程不断迭代,直到分类精度达到饱和。所提出的RRF具有较低的计算负担。在三个乳腺癌数据集中,与随机森林相比,我们的F-measure至少提高了3%。在基准数据集上的结果显示,平均分类误差显著降低。
{"title":"Reinforced random forest","authors":"Angshuman Paul, D. Mukherjee","doi":"10.1145/3009977.3010003","DOIUrl":"https://doi.org/10.1145/3009977.3010003","url":null,"abstract":"Reinforcement learning improves classification accuracy. But use of reinforcement learning is relatively unexplored in case of random forest classifier. We propose a reinforced random forest (RRF) classifier that exploits reinforcement learning to improve classification accuracy. Our algorithm is initialized with a forest. Then the entire training data is tested using the initial forest. In order to reinforce learning, we use mis-classified data points to grow certain number of new trees. A subset of the new trees is added to the existing forest using a novel graph-based approach. We show that addition of these trees ensures improvement in classification accuracy. This process is continued iteratively until classification accuracy saturates. The proposed RRF has low computational burden. We achieve at least 3% improvement in F-measure compared to random forest in three breast cancer datasets. Results on benchmark datasets show significant reduction in average classification error.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"56 1","pages":"1:1-1:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82268802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Spectral decomposition and progressive reconstruction of scalar volumes 标量体积的光谱分解与递进重建
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010017
Uddipan Mukherjee
Modern 3D imaging technologies often generate large scale volume datasets that may be represented as 3-way tensors. These volume datasets are usually compressed for compact storage, and interactive visual analysis of the data warrants efficient decompression techniques at real time. Using well known tensor decomposition techniques like CP or Tucker decomposition the volume data can be represented by a few basis vectors, the number of such vectors, called the rank of the tensor, determining the visual quality. However, in such methods, the basis vectors used between successive ranks are completely different, thereby requiring a complete recomputation of basis vectors whenever the visual quality needs to be altered. In this work, a new progressive decomposition technique is introduced for scalar volumes wherein new basis vectors are added to the already existing lower rank basis vectors. Large scale datasets are usually divided into bricks of smaller size and each such brick is represented in a compressed form. The bases used for the different bricks are data dependent and are completely different from one another. The decomposition method introduced here uses the same basis vectors for all the bricks at all hierarchical levels of detail. The basis vectors are data independent thereby minimizing storage and allowing fast data reconstruction.
现代3D成像技术经常产生大规模的体积数据集,这些数据集可以表示为3向张量。这些卷数据集通常被压缩为紧凑的存储,数据的交互式可视化分析需要实时有效的解压缩技术。使用众所周知的张量分解技术,如CP或Tucker分解,体积数据可以由几个基向量表示,这些向量的数量,称为张量的秩,决定了视觉质量。然而,在这些方法中,连续秩之间使用的基向量是完全不同的,因此当需要改变视觉质量时,需要完全重新计算基向量。在这项工作中,对标量体积引入了一种新的递进分解技术,其中新的基向量被添加到已经存在的低秩基向量中。大规模数据集通常被分成更小的块,每个这样的块以压缩形式表示。用于不同砖块的基是依赖于数据的,并且彼此完全不同。这里介绍的分解方法对所有层次细节的所有砖块使用相同的基向量。基向量是数据独立的,从而最小化存储并允许快速数据重建。
{"title":"Spectral decomposition and progressive reconstruction of scalar volumes","authors":"Uddipan Mukherjee","doi":"10.1145/3009977.3010017","DOIUrl":"https://doi.org/10.1145/3009977.3010017","url":null,"abstract":"Modern 3D imaging technologies often generate large scale volume datasets that may be represented as 3-way tensors. These volume datasets are usually compressed for compact storage, and interactive visual analysis of the data warrants efficient decompression techniques at real time. Using well known tensor decomposition techniques like CP or Tucker decomposition the volume data can be represented by a few basis vectors, the number of such vectors, called the rank of the tensor, determining the visual quality. However, in such methods, the basis vectors used between successive ranks are completely different, thereby requiring a complete recomputation of basis vectors whenever the visual quality needs to be altered. In this work, a new progressive decomposition technique is introduced for scalar volumes wherein new basis vectors are added to the already existing lower rank basis vectors. Large scale datasets are usually divided into bricks of smaller size and each such brick is represented in a compressed form. The bases used for the different bricks are data dependent and are completely different from one another. The decomposition method introduced here uses the same basis vectors for all the bricks at all hierarchical levels of detail. The basis vectors are data independent thereby minimizing storage and allowing fast data reconstruction.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"56 1","pages":"31:1-31:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82888717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event recognition in egocentric videos using a novel trajectory based feature 基于轨迹特征的自我中心视频事件识别
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010011
Vinodh Buddubariki, Sunitha Gowd Tulluri, Snehasis Mukherjee
This paper proposes an approach for event recognition in Egocentric videos using dense trajectories over Gradient Flow - Space Time Interest Point (GF-STIP) feature. We focus on recognizing events of diverse categories (including indoor and outdoor activities, sports and social activities and adventures) in egocentric videos. We introduce a dataset with diverse egocentric events, as all the existing egocentric activity recognition datasets consist of indoor videos only. The dataset introduced in this paper contains 102 videos with 9 different events (containing indoor and outdoor videos with varying lighting conditions). We extract Space Time Interest Points (STIP) from each frame of the video. The interest points are taken as the lead pixels and Gradient-Weighted Optical Flow (GWOF) features are calculated on the lead pixels by multiplying the optical flow measure and the magnitude of gradient at the pixel, to obtain the GF-STIP feature. We construct pose descriptors with the GF-STIP feature. We use the GF-STIP descriptors for recognizing events in egocentric videos with three different approaches: following a Bag of Words (BoW) model, implementing Fisher Vectors and obtaining dense trajectories for the videos. We show that the dense trajectory features based on the proposed GF-STIP descriptors enhance the efficacy of the event recognition system in egocentric videos.
本文提出了一种基于梯度流-时空兴趣点(GF-STIP)特征的密集轨迹自中心视频事件识别方法。我们专注于在以自我为中心的视频中识别不同类别的事件(包括室内和室外活动,体育和社会活动和冒险)。我们引入了一个具有多种自我中心事件的数据集,因为所有现有的自我中心活动识别数据集仅由室内视频组成。本文引入的数据集包含102个具有9个不同事件的视频(包含不同照明条件下的室内和室外视频)。我们从视频的每一帧提取时空兴趣点(STIP)。以兴趣点为先导像元,将光流测量值与像素处的梯度大小相乘,在先导像元上计算梯度加权光流(GWOF)特征,得到GF-STIP特征。我们利用GF-STIP特征构造姿态描述符。我们使用GF-STIP描述符通过三种不同的方法来识别以自我为中心的视频中的事件:遵循单词袋(BoW)模型,实现Fisher向量并获得视频的密集轨迹。结果表明,基于所提出的GF-STIP描述符的密集轨迹特征增强了自中心视频事件识别系统的有效性。
{"title":"Event recognition in egocentric videos using a novel trajectory based feature","authors":"Vinodh Buddubariki, Sunitha Gowd Tulluri, Snehasis Mukherjee","doi":"10.1145/3009977.3010011","DOIUrl":"https://doi.org/10.1145/3009977.3010011","url":null,"abstract":"This paper proposes an approach for event recognition in Egocentric videos using dense trajectories over Gradient Flow - Space Time Interest Point (GF-STIP) feature. We focus on recognizing events of diverse categories (including indoor and outdoor activities, sports and social activities and adventures) in egocentric videos. We introduce a dataset with diverse egocentric events, as all the existing egocentric activity recognition datasets consist of indoor videos only. The dataset introduced in this paper contains 102 videos with 9 different events (containing indoor and outdoor videos with varying lighting conditions). We extract Space Time Interest Points (STIP) from each frame of the video. The interest points are taken as the lead pixels and Gradient-Weighted Optical Flow (GWOF) features are calculated on the lead pixels by multiplying the optical flow measure and the magnitude of gradient at the pixel, to obtain the GF-STIP feature. We construct pose descriptors with the GF-STIP feature. We use the GF-STIP descriptors for recognizing events in egocentric videos with three different approaches: following a Bag of Words (BoW) model, implementing Fisher Vectors and obtaining dense trajectories for the videos. We show that the dense trajectory features based on the proposed GF-STIP descriptors enhance the efficacy of the event recognition system in egocentric videos.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"82 1","pages":"76:1-76:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83921090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Low complexity encoder for feedback-channel-free distributed video coding using deep convolutional neural networks at the decoder 采用深度卷积神经网络解码器的无反馈信道分布式视频编码低复杂度编码器
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009986
Pudi Raj Bhagath, J. Mukherjee, Sudipta Mukopadhayay
We propose a very low complexity encoder for feedback-channel-free distributed video coding (DVC) applications using deep convolutional neural network (CNN) at the decoder side. Deep CNN on super resolution uses low resolution (LR) images with 25% pixels information of high resolution (HR) image to super resolve it by the factor 2. Instead we train the network with 50% of noisy Wyner-Ziv (WZ) pixels to get full original WZ frame. So at the decoder, deep CNN reconstructs the original WZ image from 50% noisy WZ pixels. These noisy samples are obtained from the iterative algorithm called DLRTex. At the encoder side we compute local rank transform (LRT) of WZ frames for alternate pixels instead of all to reduce bit rate and complexity. These local rank transformed values are merged and their rank positions in the WZ frame are entropy coded using MQ-coder. In addition, average intensity values of each block of WZ frame are also transmitted to assist motion estimation. At the decoder, side information (SI) is generated by implementing motion estimation and compensation in LRT domain. The DLRTex algorithm is executed on SI using LRT to get the 50% noisy WZ pixels which are used in reconstructing full WZ frame. We compare our results with pixel domain DVC approaches and show that the coding efficiency of our codec is better than pixel domain distributed video coders based on low-density parity check and accumulate (LDPCA) or turbo codes. We also derive the complexity of our encoder interms of number of operations and prove that its complexity is very less compared to the LDPCA based methods.
我们提出了一种非常低复杂度的编码器,用于无反馈信道分布式视频编码(DVC)应用,在解码器端使用深度卷积神经网络(CNN)。超分辨率上的深度CNN使用低分辨率(LR)图像和高分辨率(HR)图像的25%像素信息,将其超分辨率提高2倍。相反,我们用50%的噪声wner - ziv (WZ)像素来训练网络,以获得完整的原始WZ帧。因此在解码器处,深度CNN从50%有噪声的WZ像素重建原始WZ图像。这些噪声样本是由迭代算法DLRTex获得的。在编码器端,我们计算WZ帧的局部秩变换(LRT)来替代所有的像素,以降低比特率和复杂性。这些局部秩变换值被合并,它们在WZ帧中的秩位置使用mq编码器进行熵编码。此外,还传输WZ帧各块的平均强度值,以辅助运动估计。在解码器中,通过在LRT域中实现运动估计和补偿来产生侧信息(SI)。利用LRT在SI上执行DLRTex算法,得到50%噪声的WZ像素,用于重建完整的WZ帧。我们将我们的结果与像素域DVC方法进行了比较,并表明我们的编解码器的编码效率优于基于低密度奇偶校验和累积(LDPCA)或turbo码的像素域分布式视频编码器。我们还推导了编码器的操作复杂度,并证明其复杂度与基于LDPCA的方法相比非常低。
{"title":"Low complexity encoder for feedback-channel-free distributed video coding using deep convolutional neural networks at the decoder","authors":"Pudi Raj Bhagath, J. Mukherjee, Sudipta Mukopadhayay","doi":"10.1145/3009977.3009986","DOIUrl":"https://doi.org/10.1145/3009977.3009986","url":null,"abstract":"We propose a very low complexity encoder for feedback-channel-free distributed video coding (DVC) applications using deep convolutional neural network (CNN) at the decoder side. Deep CNN on super resolution uses low resolution (LR) images with 25% pixels information of high resolution (HR) image to super resolve it by the factor 2. Instead we train the network with 50% of noisy Wyner-Ziv (WZ) pixels to get full original WZ frame. So at the decoder, deep CNN reconstructs the original WZ image from 50% noisy WZ pixels. These noisy samples are obtained from the iterative algorithm called DLRTex. At the encoder side we compute local rank transform (LRT) of WZ frames for alternate pixels instead of all to reduce bit rate and complexity. These local rank transformed values are merged and their rank positions in the WZ frame are entropy coded using MQ-coder. In addition, average intensity values of each block of WZ frame are also transmitted to assist motion estimation. At the decoder, side information (SI) is generated by implementing motion estimation and compensation in LRT domain. The DLRTex algorithm is executed on SI using LRT to get the 50% noisy WZ pixels which are used in reconstructing full WZ frame. We compare our results with pixel domain DVC approaches and show that the coding efficiency of our codec is better than pixel domain distributed video coders based on low-density parity check and accumulate (LDPCA) or turbo codes. We also derive the complexity of our encoder interms of number of operations and prove that its complexity is very less compared to the LDPCA based methods.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"15 4 1","pages":"44:1-44:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91263863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detection and segmentation of mirror-like surfaces using structured illumination 使用结构照明的镜面检测和分割
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010020
R. Aggarwal, A. Namboodiri
In computer vision, many active illumination techniques employ Projector-Camera systems to extract useful information from the scenes. Known illumination patterns are projected onto the scene and their deformations in the captured images are then analyzed. We observe that the local frequencies in the captured pattern for the mirror-like surfaces is different from the projected pattern. This property allows us to design a custom Projector-Camera system to segment mirror-like surfaces by analyzing the local frequencies in the captured images. The system projects a sinusoidal pattern and capture the images from projector's point of view. We present segmentation results for the scenes including multiple reflections and inter-reflections from the mirror-like surfaces. The method can further be used in the separation of direct and global components for the mirror-like surfaces by illuminating the non-mirror-like objects separately. We show how our method is also useful for accurate estimation of shape of the non-mirror-like regions in the presence of mirror-like regions in a scene.
在计算机视觉中,许多主动照明技术采用投影-相机系统从场景中提取有用的信息。已知的照明模式被投射到场景上,然后在捕获的图像中分析它们的变形。我们观察到,在镜状表面的捕获模式的局部频率不同于投影模式。这个属性允许我们设计一个定制的投影-相机系统,通过分析捕获图像中的本地频率来分割镜面。该系统投射一个正弦模式,并从投影仪的角度捕获图像。我们给出了包括镜状表面的多重反射和间反射在内的场景分割结果。该方法可进一步用于分离镜面的直接分量和全局分量,通过分别照射非镜面物体。我们展示了我们的方法如何在场景中存在镜像区域的情况下准确估计非镜像区域的形状。
{"title":"Detection and segmentation of mirror-like surfaces using structured illumination","authors":"R. Aggarwal, A. Namboodiri","doi":"10.1145/3009977.3010020","DOIUrl":"https://doi.org/10.1145/3009977.3010020","url":null,"abstract":"In computer vision, many active illumination techniques employ Projector-Camera systems to extract useful information from the scenes. Known illumination patterns are projected onto the scene and their deformations in the captured images are then analyzed. We observe that the local frequencies in the captured pattern for the mirror-like surfaces is different from the projected pattern. This property allows us to design a custom Projector-Camera system to segment mirror-like surfaces by analyzing the local frequencies in the captured images. The system projects a sinusoidal pattern and capture the images from projector's point of view. We present segmentation results for the scenes including multiple reflections and inter-reflections from the mirror-like surfaces. The method can further be used in the separation of direct and global components for the mirror-like surfaces by illuminating the non-mirror-like objects separately. We show how our method is also useful for accurate estimation of shape of the non-mirror-like regions in the presence of mirror-like regions in a scene.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"448 1","pages":"66:1-66:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86856550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient adaptive weighted minimization for compressed sensing magnetic resonance image reconstruction 压缩感知磁共振图像重构的有效自适应加权最小化
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009991
S. Datta, B. Deka
Compressed sensing magnetic resonance imaging (CSMRI) have demonstrated that it is possible to accelerate MRI scan time by reducing the number of measurements in the k-space without significant loss of anatomical details. The number of k-space measurements is roughly proportional to the sparsity of the MR signal under consideration. Recently, a few works on CSMRI have revealed that the sparsity of the MR signal can be enhanced by suitable weighting of different regularization priors. In this paper, we have proposed an efficient adaptive weighted reconstruction algorithm for the enhancement of sparsity of the MR image. Experimental results show that the proposed algorithm gives better reconstructions with less number of measurements without significant increase of the computational time compared to existing algorithms in this line.
压缩感知磁共振成像(CSMRI)已经证明,可以通过减少k空间中的测量次数来加速MRI扫描时间,而不会显著损失解剖细节。k空间测量的数量大致与所考虑的MR信号的稀疏度成正比。近年来,一些关于CSMRI的研究表明,通过对不同正则化先验的适当加权,可以增强磁共振信号的稀疏性。本文提出了一种有效的自适应加权重建算法来增强磁共振图像的稀疏性。实验结果表明,与现有算法相比,该算法在不显著增加计算时间的前提下,以较少的测量次数获得了更好的重建效果。
{"title":"Efficient adaptive weighted minimization for compressed sensing magnetic resonance image reconstruction","authors":"S. Datta, B. Deka","doi":"10.1145/3009977.3009991","DOIUrl":"https://doi.org/10.1145/3009977.3009991","url":null,"abstract":"Compressed sensing magnetic resonance imaging (CSMRI) have demonstrated that it is possible to accelerate MRI scan time by reducing the number of measurements in the k-space without significant loss of anatomical details. The number of k-space measurements is roughly proportional to the sparsity of the MR signal under consideration. Recently, a few works on CSMRI have revealed that the sparsity of the MR signal can be enhanced by suitable weighting of different regularization priors. In this paper, we have proposed an efficient adaptive weighted reconstruction algorithm for the enhancement of sparsity of the MR image. Experimental results show that the proposed algorithm gives better reconstructions with less number of measurements without significant increase of the computational time compared to existing algorithms in this line.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"10 1","pages":"95:1-95:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91086977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1