首页 > 最新文献

Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing最新文献

英文 中文
Hierarchical structured learning for indoor autonomous navigation of Quadcopter 四轴飞行器室内自主导航的分层结构学习
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009990
Vishakh Duggal, K. Bipin, Utsav Shah, K. Krishna
Autonomous navigation of generic monocular quadcopter in the indoor environment requires sophisticated approaches for perception, planning and control. This paper presents a system which enables a miniature quadcopter with a frontal monocular camera to autonomously navigate and explore the unknown indoor environment. Initially, the system estimates dense depth map of the environment from a single video frame using our proposed novel supervised Hierarchical Structured Learning (hsl) technique, which yields both high accuracy levels and better generalization. The proposed hsl approach discretizes the overall depth range into multiple sets. It structures these sets hierarchically and recursively through partitioning the set of classes into two subsets with subsets representing apportioned depth range of the parent set, forming a binary tree. The binary classification method is applied to each internal node of binary tree separately using Support Vector Machine (svm). Whereas, the depth estimation of each pixel of the image starts from the root node in top-down approach, classifying repetitively till it reaches any of the leaf node representing its estimated depth. The generated depth map is provided as an input to Convolutional Neural Network (cnn), which generates flight planning commands. Finally, trajectory planning and control module employs a convex programming technique to generate collision-free minimum time trajectory which follows these flight planning commands and produces appropriate control inputs for the quadcopter. The results convey unequivocally the advantages of depth perception by hsl, while repeatable flights of successful nature in typical indoor corridors confirm the efficacy of the pipeline.
普通单目四轴飞行器在室内环境中的自主导航需要复杂的感知、规划和控制方法。本文提出了一种微型四轴飞行器前置单目摄像头自主导航探索未知室内环境的系统。最初,系统使用我们提出的新颖的监督分层结构学习(hsl)技术从单个视频帧估计环境的密集深度图,该技术产生了高精度水平和更好的泛化。提出的hsl方法将整个深度范围离散为多个集。它通过将类集划分为两个子集,其中子集表示父集的分配深度范围,从而分层递归地构建这些集,形成二叉树。利用支持向量机(svm)对二叉树的每个内部节点分别进行二叉分类。而自顶向下方法对图像的每个像素的深度估计从根节点开始,重复分类,直到到达代表其估计深度的任何叶节点。生成的深度图作为卷积神经网络(cnn)的输入提供,卷积神经网络生成飞行计划命令。最后,轨迹规划和控制模块采用凸规划技术生成无碰撞最小时间轨迹,该轨迹遵循这些飞行规划命令并为四轴飞行器产生适当的控制输入。结果明确地传达了hsl的深度感知优势,而在典型室内走廊中成功的重复飞行证实了管道的有效性。
{"title":"Hierarchical structured learning for indoor autonomous navigation of Quadcopter","authors":"Vishakh Duggal, K. Bipin, Utsav Shah, K. Krishna","doi":"10.1145/3009977.3009990","DOIUrl":"https://doi.org/10.1145/3009977.3009990","url":null,"abstract":"Autonomous navigation of generic monocular quadcopter in the indoor environment requires sophisticated approaches for perception, planning and control. This paper presents a system which enables a miniature quadcopter with a frontal monocular camera to autonomously navigate and explore the unknown indoor environment. Initially, the system estimates dense depth map of the environment from a single video frame using our proposed novel supervised Hierarchical Structured Learning (hsl) technique, which yields both high accuracy levels and better generalization. The proposed hsl approach discretizes the overall depth range into multiple sets. It structures these sets hierarchically and recursively through partitioning the set of classes into two subsets with subsets representing apportioned depth range of the parent set, forming a binary tree. The binary classification method is applied to each internal node of binary tree separately using Support Vector Machine (svm). Whereas, the depth estimation of each pixel of the image starts from the root node in top-down approach, classifying repetitively till it reaches any of the leaf node representing its estimated depth. The generated depth map is provided as an input to Convolutional Neural Network (cnn), which generates flight planning commands. Finally, trajectory planning and control module employs a convex programming technique to generate collision-free minimum time trajectory which follows these flight planning commands and produces appropriate control inputs for the quadcopter. The results convey unequivocally the advantages of depth perception by hsl, while repeatable flights of successful nature in typical indoor corridors confirm the efficacy of the pipeline.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"1 1","pages":"13:1-13:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76462407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic video matting through scribble propagation 通过涂鸦传播自动视频抠图
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009979
Bhoomika Sonane, S. Ramakrishnan, S. Raman
Video matting is an extension of image matting and is used to extract the foreground matte from an arbitrary background of every frame in a video sequence. An automatic scribbling approach based on the relative motion of the foreground object with respect to the background in a video is introduced for video matting. The proposed scribble propagation and the subsequent isolation of foreground and background is much more intuitive than the conventional trimap propagation approach used for video matting. Alpha maps are propagated according to the optical flow estimated from the consecutive frames to get a preliminary estimate of the foreground and background in the following frame. Accurate scribbles are placed near the boundary of the foreground region for refining the scribbled image with the help of morphological operations. We show that a high quality matte of foreground object can be obtained using a state-of-the-art image matting technique. We show that the results obtained using the proposed method are accurate and comparable with that of other state-of-the-art video matting techniques.
视频抠图是图像抠图的扩展,用于从视频序列中每帧的任意背景中提取前景抠图。介绍了一种基于前景物体相对于视频背景的相对运动的自动涂写方法,用于视频抠图。所提出的涂鸦传播和随后的前景和背景的隔离比用于视频抠图的传统trimap传播方法更直观。根据从连续帧中估计出的光流传播Alpha贴图,从而对下一帧的前景和背景进行初步估计。在前景区域边界附近放置精确的涂鸦,以便借助形态学操作对涂鸦图像进行细化。我们表明,一个高质量的前景对象可以获得使用最先进的图像抠图技术。我们表明,使用所提出的方法获得的结果是准确的,并与其他最先进的视频抠图技术相媲美。
{"title":"Automatic video matting through scribble propagation","authors":"Bhoomika Sonane, S. Ramakrishnan, S. Raman","doi":"10.1145/3009977.3009979","DOIUrl":"https://doi.org/10.1145/3009977.3009979","url":null,"abstract":"Video matting is an extension of image matting and is used to extract the foreground matte from an arbitrary background of every frame in a video sequence. An automatic scribbling approach based on the relative motion of the foreground object with respect to the background in a video is introduced for video matting. The proposed scribble propagation and the subsequent isolation of foreground and background is much more intuitive than the conventional trimap propagation approach used for video matting. Alpha maps are propagated according to the optical flow estimated from the consecutive frames to get a preliminary estimate of the foreground and background in the following frame. Accurate scribbles are placed near the boundary of the foreground region for refining the scribbled image with the help of morphological operations. We show that a high quality matte of foreground object can be obtained using a state-of-the-art image matting technique. We show that the results obtained using the proposed method are accurate and comparable with that of other state-of-the-art video matting techniques.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"73 1","pages":"87:1-87:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90778349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards semantic visual representation: augmenting image representation with natural language descriptors 面向语义视觉表示:用自然语言描述符增强图像表示
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010010
Konda Reddy Mopuri, R. Venkatesh Babu
Learning image representations has been an interesting and challenging problem. When users upload images to photo sharing websites, they often provide multiple textual tags for ease of reference. These tags can reveal significant information about the content of the image such as the objects present in the image or the action that is taking place. Approaches have been proposed to extract additional information from these tags in order to augment the visual cues and build a multi-modal image representation. However, the existing approaches do not pay much attention to the semantic meaning of the tags while they encode. In this work, we attempt to enrich the image representation with the tag encodings that leverage their semantics. Our approach utilizes neural network based natural language descriptors to represent the tag information. By complementing the visual features learned by convnets, our approach results in an efficient multi-modal image representation. Experimental evaluation suggests that our approach results in a better multi-modal image representation by exploiting the two data modalities for classification on benchmark datasets.
学习图像表示一直是一个有趣且具有挑战性的问题。当用户上传图片到照片分享网站时,他们通常会提供多个文本标签以方便参考。这些标签可以揭示关于图像内容的重要信息,例如图像中出现的对象或正在发生的动作。已经提出了从这些标签中提取额外信息的方法,以增强视觉线索并构建多模态图像表示。然而,现有的方法在编码时对标签的语义含义关注不够。在这项工作中,我们试图通过利用其语义的标签编码来丰富图像表示。我们的方法利用基于神经网络的自然语言描述符来表示标签信息。通过补充由convnets学习的视觉特征,我们的方法产生了有效的多模态图像表示。实验评估表明,通过利用两种数据模式对基准数据集进行分类,我们的方法可以获得更好的多模态图像表示。
{"title":"Towards semantic visual representation: augmenting image representation with natural language descriptors","authors":"Konda Reddy Mopuri, R. Venkatesh Babu","doi":"10.1145/3009977.3010010","DOIUrl":"https://doi.org/10.1145/3009977.3010010","url":null,"abstract":"Learning image representations has been an interesting and challenging problem. When users upload images to photo sharing websites, they often provide multiple textual tags for ease of reference. These tags can reveal significant information about the content of the image such as the objects present in the image or the action that is taking place. Approaches have been proposed to extract additional information from these tags in order to augment the visual cues and build a multi-modal image representation. However, the existing approaches do not pay much attention to the semantic meaning of the tags while they encode. In this work, we attempt to enrich the image representation with the tag encodings that leverage their semantics. Our approach utilizes neural network based natural language descriptors to represent the tag information. By complementing the visual features learned by convnets, our approach results in an efficient multi-modal image representation. Experimental evaluation suggests that our approach results in a better multi-modal image representation by exploiting the two data modalities for classification on benchmark datasets.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"5 1","pages":"64:1-64:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72803062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data-driven 2D effects animation 数据驱动的2D效果动画
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010000
Divya Grover, P. Chaudhuri
Making plausible, high quality visual effects, like water splashes or fire, in traditional 2D animation pipelines require an animator to draw many frames of phenomena that are very difficult to recreate manually. We present a technique that uses a database of video clips of such phenomena to assist the animator. The animator has to only input sample sketched frames of the phenomena at particular time instants. These are matched to frames of the video clips and a plausible sequence of frames is generated from these clips that contain the animator drawn frames as constraints. The colour style of the hand-drawn frames is used to render the generated frames, thus resulting in a 2D animation that follows the style and intent of the 2D animator. Our system can also create multi-layered effects animation, allowing the animator to draw interacting mixed phenomena, like water being poured on fire.
在传统的2D动画管道中,要想创造出高质量的视觉效果,如水花或火焰,就需要动画师绘制许多帧的现象,而这些现象很难手工重现。我们提出了一种技术,使用这种现象的视频剪辑数据库来协助动画师。动画师必须只输入特定时刻现象的样本草图帧。这些帧与视频剪辑的帧相匹配,并从这些剪辑中生成一个合理的帧序列,其中包含动画师绘制的帧作为约束。手绘帧的颜色样式用于渲染生成的帧,从而产生遵循2D动画师的风格和意图的2D动画。我们的系统还可以创建多层效果动画,允许动画师绘制相互作用的混合现象,如水被倒在火上。
{"title":"Data-driven 2D effects animation","authors":"Divya Grover, P. Chaudhuri","doi":"10.1145/3009977.3010000","DOIUrl":"https://doi.org/10.1145/3009977.3010000","url":null,"abstract":"Making plausible, high quality visual effects, like water splashes or fire, in traditional 2D animation pipelines require an animator to draw many frames of phenomena that are very difficult to recreate manually. We present a technique that uses a database of video clips of such phenomena to assist the animator. The animator has to only input sample sketched frames of the phenomena at particular time instants. These are matched to frames of the video clips and a plausible sequence of frames is generated from these clips that contain the animator drawn frames as constraints. The colour style of the hand-drawn frames is used to render the generated frames, thus resulting in a 2D animation that follows the style and intent of the 2D animator. Our system can also create multi-layered effects animation, allowing the animator to draw interacting mixed phenomena, like water being poured on fire.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"os-18 1","pages":"37:1-37:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87392342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Image defencing via signal demixing 通过信号分解进行图像防御
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3009984
Veepin Kumar, J. Mukherjee, S. Mandal
We present a novel algorithm to remove near regular, fence or wire like foreground patterns from an image. The fence detection or fence removal algorithms, developed so far, have poor performance in detecting the fence. We use signal demixing to utilize the sparsity and regularity property of fences to detect them. Results demonstrate the effectiveness of our technique as compared to other state of the art techniques.
我们提出了一种新的算法来从图像中去除接近规则的,栅栏或线状的前景图案。目前开发的围栏检测或围栏去除算法在检测围栏方面的性能较差。利用信号的稀疏性和规则性对栅极进行检测。结果表明,与其他先进技术相比,我们的技术是有效的。
{"title":"Image defencing via signal demixing","authors":"Veepin Kumar, J. Mukherjee, S. Mandal","doi":"10.1145/3009977.3009984","DOIUrl":"https://doi.org/10.1145/3009977.3009984","url":null,"abstract":"We present a novel algorithm to remove near regular, fence or wire like foreground patterns from an image. The fence detection or fence removal algorithms, developed so far, have poor performance in detecting the fence. We use signal demixing to utilize the sparsity and regularity property of fences to detect them. Results demonstrate the effectiveness of our technique as compared to other state of the art techniques.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"30 1","pages":"11:1-11:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87469128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A biologically inspired saliency model for color fundus images 彩色眼底图像的生物学启发的显著性模型
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010041
Samrudhdhi B. Rangrej, J. Sivaswamy
Saliency computation is widely studied in computer vision but not in medical imaging. Existing computational saliency models have been developed for general (natural) images and hence may not be suitable for medical images. This is due to the variety of imaging modalities and the requirement of the models to capture not only normal but also deviations from normal anatomy. We present a biologically inspired model for colour fundus images and illustrate it for the case of diabetic retinopathy. The proposed model uses spatially-varying morphological operations to enhance lesions locally and combines an ensemble of results, of such operations, to generate the saliency map. The model is validated against an average Human Gaze map of 15 experts and found to have 10% higher recall (at 100% precision) than four leading saliency models proposed for natural images. The F-score for match with manual lesion markings by 5 experts was 0.4 (as opposed to 0.532 for gaze map) for our model and very poor for existing models. The model's utility is shown via a novel enhancement method which employs saliency to selectively enhance the abnormal regions and this was found to boost their contrast to noise ratio by ∼ 30%.
显著性计算在计算机视觉领域得到了广泛的研究,但在医学成像领域却没有得到广泛的研究。现有的计算显著性模型是为一般(自然)图像开发的,因此可能不适用于医学图像。这是由于各种各样的成像方式和模型的要求,不仅要捕获正常的,而且偏离正常解剖。我们提出了一个生物启发模型彩色眼底图像和说明它的情况下,糖尿病视网膜病变。所提出的模型使用空间变化的形态学操作来局部增强病变,并结合这些操作的结果集合来生成显著性图。该模型与15位专家的平均人类凝视图进行了验证,发现与针对自然图像提出的四种主要显著性模型相比,该模型的召回率(100%精度)提高了10%。对于我们的模型,5位专家与手动病变标记匹配的f值为0.4(相对于凝视图的0.532),对于现有模型来说非常差。该模型的效用通过一种新颖的增强方法来显示,该方法采用显著性来选择性地增强异常区域,并发现这可以将其对比度与噪声比提高约30%。
{"title":"A biologically inspired saliency model for color fundus images","authors":"Samrudhdhi B. Rangrej, J. Sivaswamy","doi":"10.1145/3009977.3010041","DOIUrl":"https://doi.org/10.1145/3009977.3010041","url":null,"abstract":"Saliency computation is widely studied in computer vision but not in medical imaging. Existing computational saliency models have been developed for general (natural) images and hence may not be suitable for medical images. This is due to the variety of imaging modalities and the requirement of the models to capture not only normal but also deviations from normal anatomy. We present a biologically inspired model for colour fundus images and illustrate it for the case of diabetic retinopathy. The proposed model uses spatially-varying morphological operations to enhance lesions locally and combines an ensemble of results, of such operations, to generate the saliency map. The model is validated against an average Human Gaze map of 15 experts and found to have 10% higher recall (at 100% precision) than four leading saliency models proposed for natural images. The F-score for match with manual lesion markings by 5 experts was 0.4 (as opposed to 0.532 for gaze map) for our model and very poor for existing models. The model's utility is shown via a novel enhancement method which employs saliency to selectively enhance the abnormal regions and this was found to boost their contrast to noise ratio by ∼ 30%.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"84 1","pages":"54:1-54:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83866628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep neural networks for segmentation of basal ganglia sub-structures in brain MR images 脑磁共振图像基底神经节亚结构的深度神经网络分割
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010048
Akshay Sethi, Akshat Sinha, Ayush Agarwal, Chetan Arora, Anubha Gupta
Automated segmentation of brain structure in magnetic resonance imaging (MRI) scans is an important first step in diagnosis of many neurological diseases. In this paper, we focus on segmentation of the constituent sub-structures of basal ganglia (BG) region of the brain that are responsible for controlling movement and routine learning. Low contrast voxels and undefined boundaries across sub-regions of BG pose a challenge for automated segmentation. We pose the segmentation as a voxel classification problem and propose a Deep Neural Network (DNN) based classifier for BG segmentation. The DNN is able to learn distinct regional features for voxel-wise classification of BG area into four sub-regions, namely, Caudate, Putamen, Pallidum, and Accumbens. We use a public dataset with a collection of 83 T-1 weighted uniform dimension structural MRI scans of healthy and diseased (Bipolar with and without Psychosis, Schizophrenia) subjects. In order to build a robust classifier, the proposed classifier has been trained on a mixed collection of healthy and diseased MRs. We report an accuracy of above 94% (as calculated using the dice coefficient) for all the four classes of healthy and diseased dataset.
磁共振成像(MRI)扫描中脑结构的自动分割是许多神经系统疾病诊断的重要第一步。在本文中,我们重点研究了脑基底神经节(BG)区域的组成亚结构的分割,该区域负责控制运动和日常学习。低对比度体素和未定义的BG子区域边界对自动分割提出了挑战。我们将分割作为一个体素分类问题,并提出了一个基于深度神经网络(DNN)的BG分割分类器。DNN能够学习不同的区域特征,将BG区域按体素分类为四个子区域,即尾状核、壳核、Pallidum和伏隔核。我们使用了一个公共数据集,其中收集了83个健康和患病(伴有和不伴有精神病、精神分裂症的双相情感障碍)受试者的T-1加权均匀维结构MRI扫描。为了建立一个鲁棒分类器,所提出的分类器已经在健康和患病夫人的混合集合上进行了训练,我们报告了所有四类健康和患病数据集的准确率超过94%(使用骰子系数计算)。
{"title":"Deep neural networks for segmentation of basal ganglia sub-structures in brain MR images","authors":"Akshay Sethi, Akshat Sinha, Ayush Agarwal, Chetan Arora, Anubha Gupta","doi":"10.1145/3009977.3010048","DOIUrl":"https://doi.org/10.1145/3009977.3010048","url":null,"abstract":"Automated segmentation of brain structure in magnetic resonance imaging (MRI) scans is an important first step in diagnosis of many neurological diseases. In this paper, we focus on segmentation of the constituent sub-structures of basal ganglia (BG) region of the brain that are responsible for controlling movement and routine learning. Low contrast voxels and undefined boundaries across sub-regions of BG pose a challenge for automated segmentation. We pose the segmentation as a voxel classification problem and propose a Deep Neural Network (DNN) based classifier for BG segmentation. The DNN is able to learn distinct regional features for voxel-wise classification of BG area into four sub-regions, namely, Caudate, Putamen, Pallidum, and Accumbens. We use a public dataset with a collection of 83 T-1 weighted uniform dimension structural MRI scans of healthy and diseased (Bipolar with and without Psychosis, Schizophrenia) subjects. In order to build a robust classifier, the proposed classifier has been trained on a mixed collection of healthy and diseased MRs. We report an accuracy of above 94% (as calculated using the dice coefficient) for all the four classes of healthy and diseased dataset.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"31 1","pages":"20:1-20:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86215236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognizing facial expressions using novel motion based features 使用新颖的基于运动的特征识别面部表情
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010004
Snehasis Mukherjee, B. Vamshi, K. V. Sai Vineeth Kumar Reddy, Repala Vamshi Krishna, S. V. S. Harish
This paper introduces two novel motion based features for recognizing human facial expressions. The proposed motion features are applied for recognizing facial expressions from a video sequence. The proposed bag-of-words based scheme represents each frame of a video sequence as a vector depicting local motion patterns during a facial expression. The local motion patterns are captured by an efficient derivation from optical flow. Motion features are clustered and stored as words in a dictionary. We further generate a reduced dictionary by ranking the words based on some ambiguity measure. We prune out the ambiguous words and continue with key words in the reduced dictionary. The ambiguity measure is given by applying a graph-based technique, where each word is represented as a node in the graph. Ambiguity measures are obtained by modelling the frequency of occurrence of the word during the expression. We form expression descriptors for each expression from the reduced dictionary, by applying an efficient kernel. The training of the expression descriptors are made following an adaptive learning technique. We tested the proposed approach with standard dataset. The proposed approach shows better accuracy compared to the state-of-the-art.
本文介绍了两种基于运动特征的人脸表情识别方法。将提出的运动特征应用于从视频序列中识别面部表情。所提出的基于词袋的方案将视频序列的每一帧表示为描述面部表情期间局部运动模式的矢量。通过光流的有效推导捕获了局部运动模式。运动特征被聚类并作为单词存储在字典中。我们进一步通过基于一些歧义度量对单词进行排序来生成一个简化的字典。我们将歧义词删去,并在简化后的字典中继续使用关键词。通过应用基于图的技术给出歧义度量,其中每个单词都表示为图中的一个节点。歧义度量是通过对单词在表达中出现的频率进行建模来获得的。通过应用一个有效的内核,我们为约简字典中的每个表达式形成表达式描述符。表达式描述符的训练采用自适应学习技术。我们用标准数据集测试了所提出的方法。与最先进的方法相比,所提出的方法具有更好的准确性。
{"title":"Recognizing facial expressions using novel motion based features","authors":"Snehasis Mukherjee, B. Vamshi, K. V. Sai Vineeth Kumar Reddy, Repala Vamshi Krishna, S. V. S. Harish","doi":"10.1145/3009977.3010004","DOIUrl":"https://doi.org/10.1145/3009977.3010004","url":null,"abstract":"This paper introduces two novel motion based features for recognizing human facial expressions. The proposed motion features are applied for recognizing facial expressions from a video sequence. The proposed bag-of-words based scheme represents each frame of a video sequence as a vector depicting local motion patterns during a facial expression. The local motion patterns are captured by an efficient derivation from optical flow. Motion features are clustered and stored as words in a dictionary. We further generate a reduced dictionary by ranking the words based on some ambiguity measure. We prune out the ambiguous words and continue with key words in the reduced dictionary. The ambiguity measure is given by applying a graph-based technique, where each word is represented as a node in the graph. Ambiguity measures are obtained by modelling the frequency of occurrence of the word during the expression. We form expression descriptors for each expression from the reduced dictionary, by applying an efficient kernel. The training of the expression descriptors are made following an adaptive learning technique. We tested the proposed approach with standard dataset. The proposed approach shows better accuracy compared to the state-of-the-art.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"63 1","pages":"32:1-32:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76260391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Autoregressive hidden Markov model with missing data for modelling functional MR imaging data 具有缺失数据的自回归隐马尔可夫模型用于功能性磁共振成像数据的建模
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010021
Shilpa Dang, S. Chaudhury, Brejesh Lall, P. Roy
Functional Magnetic Resonance Imaging (fMRI) has opened ways to look inside active human brain. However, fMRI signal is an indirect indicator of underlying neuronal activity and has low-temporal resolution due to acquisition process. This paper proposes autoregressive hidden Markov model with missing data (AR-HMM-md) framework which aims at addressing aforementioned issues while allowing accurate capturing of fMRI time series characteristics. The proposed work models unobserved neuronal activity over time as sequence of discrete hidden states, and shows how exact inference can be obtained with missing fMRI data under the "Missing not at Random" (MNAR) mechanism. This mechanism requires explicit modelling of the missing data along with the observed data. The performance is evaluated by observing convergence characteristic of log-likelihoods and classification capability of the proposed model over existing models for two fMRI datasets. The classification is performed between real fMRI time series from a task-based experiment and randomly-generated time series. Another classification experiment is performed between children and elder subjects using fMRI time series from resting-state data. The proposed model captured the fMRI characteristics efficiently and thus converged to better posterior probability resulting into higher classification accuracy over existing models for both the datasets.
功能磁共振成像(fMRI)为观察活跃的人类大脑内部开辟了途径。然而,fMRI信号是潜在神经元活动的间接指标,由于获取过程,其时间分辨率较低。本文提出了一种具有缺失数据的自回归隐马尔可夫模型(AR-HMM-md)框架,该框架旨在解决上述问题,同时允许准确捕获fMRI时间序列特征。所提出的工作模型将未观察到的神经元活动随时间的变化作为离散隐藏状态的序列,并展示了如何在“非随机缺失”(MNAR)机制下从缺失的fMRI数据中获得精确推断。这种机制需要对缺失的数据以及观测到的数据进行显式建模。通过观察两个fMRI数据集上所提出模型的对数似然收敛特性和分类能力来评估性能。在基于任务的实验的真实fMRI时间序列和随机生成的时间序列之间进行分类。利用静息状态数据的fMRI时间序列对儿童和老年受试者进行分类实验。所提出的模型有效地捕获了fMRI特征,从而收敛到更好的后验概率,从而在两个数据集上获得比现有模型更高的分类精度。
{"title":"Autoregressive hidden Markov model with missing data for modelling functional MR imaging data","authors":"Shilpa Dang, S. Chaudhury, Brejesh Lall, P. Roy","doi":"10.1145/3009977.3010021","DOIUrl":"https://doi.org/10.1145/3009977.3010021","url":null,"abstract":"Functional Magnetic Resonance Imaging (fMRI) has opened ways to look inside active human brain. However, fMRI signal is an indirect indicator of underlying neuronal activity and has low-temporal resolution due to acquisition process. This paper proposes autoregressive hidden Markov model with missing data (AR-HMM-md) framework which aims at addressing aforementioned issues while allowing accurate capturing of fMRI time series characteristics. The proposed work models unobserved neuronal activity over time as sequence of discrete hidden states, and shows how exact inference can be obtained with missing fMRI data under the \"Missing not at Random\" (MNAR) mechanism. This mechanism requires explicit modelling of the missing data along with the observed data. The performance is evaluated by observing convergence characteristic of log-likelihoods and classification capability of the proposed model over existing models for two fMRI datasets. The classification is performed between real fMRI time series from a task-based experiment and randomly-generated time series. Another classification experiment is performed between children and elder subjects using fMRI time series from resting-state data. The proposed model captured the fMRI characteristics efficiently and thus converged to better posterior probability resulting into higher classification accuracy over existing models for both the datasets.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"62 1","pages":"93:1-93:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73988031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Intrinsic image decomposition using focal stacks 利用焦点叠加进行图像的内禀分解
Pub Date : 2016-12-18 DOI: 10.1145/3009977.3010046
Saurabh Saini, P. Sakurikar, P J Narayanan
In this paper, we presents a novel method (RGBF-IID) for intrinsic image decomposition of a wild scene without any restrictions on the complexity, illumination or scale of the image. We use focal stacks of the scene as input. A focal stack captures a scene at varying focal distances. Since focus depends on distance to the object, this representation has information beyond an RGB image towards an RGBD image with depth. We call our representation an RGBF image to highlight this. We use a robust focus measure and generalized random walk algorithm to compute dense probability maps across the stack. These maps are used to define sparse local and global pixel neighbourhoods, adhering to the structure of the underlying 3D scene. We use these neighbourhood correspondences with standard chromaticity assumptions as constraints in an optimization system. We present our results on both indoor and outdoor scenes using manually captured stacks of random objects under natural as well as artificial lighting conditions. We also test our system on a larger dataset of synthetically generated focal stacks from NYUv2 and MPI Sintel datasets and show competitive performance against current state-of-the-art IID methods that use RGBD images. Our method provides a strong evidence for the potential of RGBF modality in place of RGBD in computer vision.
本文提出了一种不受图像复杂度、光照和尺度限制的野生场景内禀图像分解方法(RGBF-IID)。我们使用场景的焦点堆栈作为输入。焦堆捕捉不同焦距的场景。由于焦点取决于与对象的距离,因此这种表示具有超越RGB图像的信息,而是具有深度的RGBD图像。为了强调这一点,我们称我们的表示为RGBF图像。我们使用鲁棒焦点度量和广义随机漫步算法来计算堆栈上的密集概率图。这些地图用于定义稀疏的局部和全局像素邻域,遵循底层3D场景的结构。我们使用这些邻域对应与标准色度假设作为优化系统的约束。我们在自然和人工照明条件下使用手动捕获的随机物体堆栈在室内和室外场景中展示了我们的结果。我们还在NYUv2和MPI sinintel数据集合成的焦点堆栈的更大数据集上测试了我们的系统,并显示了与当前使用RGBD图像的最先进的IID方法相比具有竞争力的性能。我们的方法为RGBF模式在计算机视觉中取代RGBD的潜力提供了强有力的证据。
{"title":"Intrinsic image decomposition using focal stacks","authors":"Saurabh Saini, P. Sakurikar, P J Narayanan","doi":"10.1145/3009977.3010046","DOIUrl":"https://doi.org/10.1145/3009977.3010046","url":null,"abstract":"In this paper, we presents a novel method (RGBF-IID) for intrinsic image decomposition of a wild scene without any restrictions on the complexity, illumination or scale of the image. We use focal stacks of the scene as input. A focal stack captures a scene at varying focal distances. Since focus depends on distance to the object, this representation has information beyond an RGB image towards an RGBD image with depth. We call our representation an RGBF image to highlight this. We use a robust focus measure and generalized random walk algorithm to compute dense probability maps across the stack. These maps are used to define sparse local and global pixel neighbourhoods, adhering to the structure of the underlying 3D scene. We use these neighbourhood correspondences with standard chromaticity assumptions as constraints in an optimization system. We present our results on both indoor and outdoor scenes using manually captured stacks of random objects under natural as well as artificial lighting conditions. We also test our system on a larger dataset of synthetically generated focal stacks from NYUv2 and MPI Sintel datasets and show competitive performance against current state-of-the-art IID methods that use RGBD images. Our method provides a strong evidence for the potential of RGBF modality in place of RGBD in computer vision.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"49 1","pages":"88:1-88:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74161262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1