2017 14th Conference on Computer and Robot Vision (CRV)最新文献

英文中文

Condition and Viewpoint Invariant Omni-Directional Place Recognition Using CNN 基于CNN的条件和视点不变全向位置识别

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.26

Devinder Kumar, H. Neher, Arun Das, David A Clausi, Steven L. Waslander

Robust place recognition systems are essential for long term localization and autonomy. Such systems should recognize scenes with both conditional and viewpoint changes. In this paper, we present a deep learning based planar omni-directional place recognition approach that can simultaneously cope with conditional and viewpoint variations, including large viewpoint changes, which current methods do not address. We evaluate the proposed method on two real world datasets dealing with illumination, seasonal/weather changes and changes occurred in the environment across a period of 1 year, respectively. We provide both quantitative (recall at 100% precision) and qualitative (confusion matrices) comparison of the basic pipeline of place recognition for the omni-directional approach with single-view and side-view camera approaches. The results prove the efficacy of the proposed omnidirectional deep learning method over the single-view and side-view cameras in dealing with both conditional and large viewpoint changes.

强大的位置识别系统对于长期定位和自主至关重要。这样的系统应该能够识别有条件和视点变化的场景。在本文中，我们提出了一种基于深度学习的平面全方位位置识别方法，该方法可以同时处理条件和视点变化，包括大视点变化，这是当前方法无法解决的。我们在两个真实世界的数据集上对所提出的方法进行了评估，这些数据集分别处理了照明、季节/天气变化和1年期间环境发生的变化。我们提供了定量的(100%精确召回率)和定性的(混淆矩阵)对全向方法与单视图和侧视图相机方法的位置识别的基本管道进行比较。结果证明了所提出的全方位深度学习方法在处理条件和大视点变化方面优于单视图和侧视图相机。

引用次数: 3

Visual Quality Assessment for Projected Content 投影内容的视觉质量评估

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.47

Hoang Le, Carl S. Marshall, T. Doan, Long Mai, Feng Liu

Today's projectors are widely used for information and media display in a stationary setup. There is also a growing effort to deploy projectors creatively, such as using a mobile projector to display visual content on an arbitrary surface. However, the quality of projected content is often limited by the quality of projection surface, environment lighting, and non-optimal projector settings. This paper presents a visual quality assessment method for projected content. Our method assesses the quality of the projected image by analyzing the projected image captured by a camera. The key challenge is that the quality of the captured image is often different from the perceived quality by a viewer as she "sees" the projected image differently than the camera. To address this problem, our method employs a data-driven approach that learns from the labeled data to bridge this gap. Our method integrates both manually crafted features and deep learning features and formulates projection quality assessment as a regression problem. Our experiments on a wide range of projection content, projection surfaces, and environment lighting show that our method can reliably score the quality of projected visual content in a way that is consistent with the human perception.

今天的投影仪被广泛用于固定装置中的信息和媒体显示。也有越来越多的努力创造性地部署投影仪，例如使用移动投影仪在任意表面上显示视觉内容。然而，投影内容的质量通常受到投影表面质量、环境照明和非最佳投影机设置的限制。本文提出了一种投影内容的视觉质量评价方法。我们的方法通过分析由相机捕获的投影图像来评估投影图像的质量。关键的挑战在于，捕捉到的图像的质量通常与观众感知到的质量不同，因为她“看到”的投影图像与相机看到的图像不同。为了解决这个问题，我们的方法采用了一种数据驱动的方法，从标记的数据中学习来弥补这个差距。我们的方法集成了手工制作的特征和深度学习的特征，并将投影质量评估作为一个回归问题。我们在大范围的投影内容、投影表面和环境照明上的实验表明，我们的方法可以可靠地以与人类感知一致的方式对投影视觉内容的质量进行评分。

{"title":"Visual Quality Assessment for Projected Content","authors":"Hoang Le, Carl S. Marshall, T. Doan, Long Mai, Feng Liu","doi":"10.1109/CRV.2017.47","DOIUrl":"https://doi.org/10.1109/CRV.2017.47","url":null,"abstract":"Today's projectors are widely used for information and media display in a stationary setup. There is also a growing effort to deploy projectors creatively, such as using a mobile projector to display visual content on an arbitrary surface. However, the quality of projected content is often limited by the quality of projection surface, environment lighting, and non-optimal projector settings. This paper presents a visual quality assessment method for projected content. Our method assesses the quality of the projected image by analyzing the projected image captured by a camera. The key challenge is that the quality of the captured image is often different from the perceived quality by a viewer as she \"sees\" the projected image differently than the camera. To address this problem, our method employs a data-driven approach that learns from the labeled data to bridge this gap. Our method integrates both manually crafted features and deep learning features and formulates projection quality assessment as a regression problem. Our experiments on a wide range of projection content, projection surfaces, and environment lighting show that our method can reliably score the quality of projected visual content in a way that is consistent with the human perception.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123045613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Estimating Camera Tilt from Motion without Tracking 估计相机倾斜从运动没有跟踪

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.36

Nada Elassal, J. Elder

Most methods for automatic estimation of external camera parameters (e.g., tilt angle) from deployed cameras are based on vanishing points. This requires that specific static scene features, e.g., sets of parallel lines, be present and reliably detected, and this is not always possible. An alternative is to use properties of the motion field computed over multiple frames. However, methods reported to date make strong assumptions about the nature of objects and motions in the scene, and often depend on feature tracking, which can be computationally intensive and unreliable. In this paper, we propose a novel motion-based approach for recovering camera tilt that does not require tracking. Our method assumes that motion statistics in the scene are stationary over the ground plane, so that statistical variation in image speed with vertical position in the image can be attributed to projection. The tilt angle is then estimated iteratively by nulling the variance in rectified speed explained by the vertical image coordinate. The method does not require tracking or learning and can therefore be applied without modification to diverse scene conditions. The algorithm is evaluated on four diverse datasets and found to outperform three alternative state-of-the-art methods.

从部署的摄像机中自动估计外部摄像机参数(例如，倾斜角度)的大多数方法是基于消失点的。这需要特定的静态场景特征，例如，平行线组，存在并可靠地检测到，这并不总是可能的。另一种选择是使用在多个帧上计算的运动场的属性。然而，迄今为止报道的方法对场景中物体和运动的性质做出了很强的假设，并且通常依赖于特征跟踪，这可能是计算密集型和不可靠的。在本文中，我们提出了一种新的基于运动的方法来恢复相机倾斜，不需要跟踪。我们的方法假设场景中的运动统计量在地平面上是静止的，因此图像速度随图像垂直位置的统计变化可以归因于投影。然后通过消除由垂直图像坐标解释的校正速度方差来迭代估计倾斜角。该方法不需要跟踪或学习，因此无需修改即可应用于各种场景条件。该算法在四个不同的数据集上进行了评估，并发现优于三种替代的最先进的方法。

{"title":"Estimating Camera Tilt from Motion without Tracking","authors":"Nada Elassal, J. Elder","doi":"10.1109/CRV.2017.36","DOIUrl":"https://doi.org/10.1109/CRV.2017.36","url":null,"abstract":"Most methods for automatic estimation of external camera parameters (e.g., tilt angle) from deployed cameras are based on vanishing points. This requires that specific static scene features, e.g., sets of parallel lines, be present and reliably detected, and this is not always possible. An alternative is to use properties of the motion field computed over multiple frames. However, methods reported to date make strong assumptions about the nature of objects and motions in the scene, and often depend on feature tracking, which can be computationally intensive and unreliable. In this paper, we propose a novel motion-based approach for recovering camera tilt that does not require tracking. Our method assumes that motion statistics in the scene are stationary over the ground plane, so that statistical variation in image speed with vertical position in the image can be attributed to projection. The tilt angle is then estimated iteratively by nulling the variance in rectified speed explained by the vertical image coordinate. The method does not require tracking or learning and can therefore be applied without modification to diverse scene conditions. The algorithm is evaluated on four diverse datasets and found to outperform three alternative state-of-the-art methods.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116156146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development of a Plug-and-Play Infrared Landing System for Multirotor Unmanned Aerial Vehicles 多旋翼无人机即插即用红外着陆系统的研制

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.23

Ephraim Nowak, Kashish Gupta, H. Najjaran

Precise landing of multirotor unmanned aerial vehicles (UAVs) in confined, GPS-denied and vision-compromised environments presents a challenge to common autopilot systems. In this work we outline an autonomous infrared (IR) landing system using a ground-based IR radiator, UAV-mounted IR camera, and image processing computer. Previous work has focused on UAV-mounted IR sources for UAV localization, or systems using multiple distributed ground-based IR sources to estimate UAV pose. We experimented with the use of a single ground-based IR radiator to determine the UAV's relative location in three-dimensional space. The outcome of our research significantly simplifies the landing zone setup by requiring only a single IR source, and increases operational flexibility, as the vision-based system adapts to changes in landing zone position. The usefulness of our system is especially demonstrated in vision-compromised applications such as nighttime operations, or in smoky environments observed during forest fires. We also evaluated a high-power IR radiator for future research in the field of outdoor autonomous point-to-point navigation between IR sources where GPS is unavailable.

多旋翼无人机(uav)在受限、gps拒绝和视觉受损环境下的精确着陆对普通自动驾驶系统提出了挑战。在这项工作中，我们概述了一个自主红外(IR)着陆系统，该系统使用地面红外散热器，无人机安装的红外相机和图像处理计算机。以前的工作主要集中在用于无人机定位的无人机机载红外源，或者使用多个分布式地面红外源来估计无人机姿态的系统。我们尝试使用单一地面红外辐射器来确定无人机在三维空间中的相对位置。我们的研究结果极大地简化了着陆区设置，只需要一个单一的红外源，并增加了操作的灵活性，因为基于视觉的系统可以适应着陆区位置的变化。我们的系统在夜间操作或森林火灾期间观察到的烟雾环境等视力受损的应用中尤其有用。我们还评估了一种大功率红外散热器，用于未来在GPS不可用的红外源之间的户外自主点对点导航领域的研究。

{"title":"Development of a Plug-and-Play Infrared Landing System for Multirotor Unmanned Aerial Vehicles","authors":"Ephraim Nowak, Kashish Gupta, H. Najjaran","doi":"10.1109/CRV.2017.23","DOIUrl":"https://doi.org/10.1109/CRV.2017.23","url":null,"abstract":"Precise landing of multirotor unmanned aerial vehicles (UAVs) in confined, GPS-denied and vision-compromised environments presents a challenge to common autopilot systems. In this work we outline an autonomous infrared (IR) landing system using a ground-based IR radiator, UAV-mounted IR camera, and image processing computer. Previous work has focused on UAV-mounted IR sources for UAV localization, or systems using multiple distributed ground-based IR sources to estimate UAV pose. We experimented with the use of a single ground-based IR radiator to determine the UAV's relative location in three-dimensional space. The outcome of our research significantly simplifies the landing zone setup by requiring only a single IR source, and increases operational flexibility, as the vision-based system adapts to changes in landing zone position. The usefulness of our system is especially demonstrated in vision-compromised applications such as nighttime operations, or in smoky environments observed during forest fires. We also evaluated a high-power IR radiator for future research in the field of outdoor autonomous point-to-point navigation between IR sources where GPS is unavailable.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126951844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Towards a Knowledge-Based Approach for Generating Video Descriptions 基于知识的视频描述生成方法研究

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.51

Sathyanarayanan N. Aakur, F. Souza, Sudeep Sarkar

Existent video description approaches advocated in the literature rely on capturing the semantic relationships among concepts and visual features from training data specific to various datasets. Naturally, their success at generalizing the video descriptions for the domain is closely dependent on the availability, representativeness, size and annotation quality of the training data. Common issues are overfitting, the amount of training data and computational time required for the model. To overcome these issues, we propose to alleviate the learning of semantic knowledge from domain-specific datasets by leveraging general human knowledge sources such as ConceptNet. We propose the use of ConceptNet as the source of knowledge for generating video descriptions using Grenander's pattern theory formalism. Instead of relying on training data to estimate semantic compatibility of two concepts, we use weights in the ConceptNet that determines the degree of validity of the assertion between two concepts based on the knowledge sources. We test and compare this idea on the task of generating semantically coherent descriptions for videos from the Breakfast Actions and Carnegie Mellon's Multimodal activities dataset. In comparison with other approaches, the proposed method achieves comparable accuracy against state-of-the-art methods based on HMMs and CFGs and generate semantically coherent descriptions even when presented with inconsistent action and object labels. We are also able to show that the proposed approach performs comparably with models trained on domain-specific data.

现有文献中倡导的视频描述方法依赖于捕获特定于各种数据集的训练数据中概念之间的语义关系和视觉特征。当然，他们在推广该领域视频描述方面的成功与训练数据的可用性、代表性、大小和注释质量密切相关。常见的问题是过拟合，训练数据量和模型所需的计算时间。为了克服这些问题，我们建议通过利用一般的人类知识来源(如ConceptNet)来减轻从特定领域数据集中学习语义知识的问题。我们建议使用ConceptNet作为使用Grenander模式理论形式主义生成视频描述的知识来源。我们不再依赖训练数据来估计两个概念的语义兼容性，而是在概念网络中使用权重来确定基于知识来源的两个概念之间断言的有效性程度。我们在为早餐行动和卡内基梅隆大学的多模态活动数据集的视频生成语义连贯描述的任务上测试并比较了这个想法。与其他方法相比，所提出的方法与基于hmm和CFGs的最先进方法相比具有相当的准确性，并且即使在呈现不一致的动作和对象标签时也能生成语义连贯的描述。我们还能够证明所提出的方法与在特定领域数据上训练的模型的性能相当。

{"title":"Towards a Knowledge-Based Approach for Generating Video Descriptions","authors":"Sathyanarayanan N. Aakur, F. Souza, Sudeep Sarkar","doi":"10.1109/CRV.2017.51","DOIUrl":"https://doi.org/10.1109/CRV.2017.51","url":null,"abstract":"Existent video description approaches advocated in the literature rely on capturing the semantic relationships among concepts and visual features from training data specific to various datasets. Naturally, their success at generalizing the video descriptions for the domain is closely dependent on the availability, representativeness, size and annotation quality of the training data. Common issues are overfitting, the amount of training data and computational time required for the model. To overcome these issues, we propose to alleviate the learning of semantic knowledge from domain-specific datasets by leveraging general human knowledge sources such as ConceptNet. We propose the use of ConceptNet as the source of knowledge for generating video descriptions using Grenander's pattern theory formalism. Instead of relying on training data to estimate semantic compatibility of two concepts, we use weights in the ConceptNet that determines the degree of validity of the assertion between two concepts based on the knowledge sources. We test and compare this idea on the task of generating semantically coherent descriptions for videos from the Breakfast Actions and Carnegie Mellon's Multimodal activities dataset. In comparison with other approaches, the proposed method achieves comparable accuracy against state-of-the-art methods based on HMMs and CFGs and generate semantically coherent descriptions even when presented with inconsistent action and object labels. We are also able to show that the proposed approach performs comparably with models trained on domain-specific data.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126644590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Scale-Corrected Background Modeling 比例校正背景建模

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.31

P. Siva, Michael Jamieson

Modern security cameras are capable of capturing high-resolution HD or 4K videos and support embedded analytics capable of automatically tracking objects such as people and cars moving through the scene. However, due to a lack of computational power on these cameras, the embedded video analytics cannot utilize the full available video resolution, severely limiting the range at which they can detect objects. We present a technique for scale correction, leveraging approximate camera calibration information, that uses high image resolutions in parts of the frame that are far from the camera and lower image resolution in parts of the frame that are closer to the camera. Existing background models can run on the proposed scale-normalized high-resolution (1280x720) video frame for a similar computational cost as an unnormalized 640x360 frame. Our proposed scale correction technique also improves object-level precision and recall.

现代安全摄像头能够捕捉高分辨率高清或4K视频，并支持嵌入式分析，能够自动跟踪在场景中移动的人和汽车等物体。然而，由于这些摄像机的计算能力不足，嵌入式视频分析无法充分利用可用的视频分辨率，严重限制了它们检测物体的范围。我们提出了一种利用近似相机校准信息的比例校正技术，该技术在远离相机的部分帧中使用高图像分辨率，而在靠近相机的部分帧中使用低图像分辨率。现有的背景模型可以在建议的比例归一化高分辨率(1280x720)视频帧上运行，计算成本与未归一化的640x360帧相似。我们提出的尺度校正技术也提高了对象级精度和召回率。

引用次数: 0

Combined Strategy of Machine Vision with a Robotic Assistant for Nail Biting Prevention 机器视觉与机器人助手相结合的防咬指甲策略

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.57

Jonathan Camargo, Aaron J. Young

Nail biting or onychophagia is a body-focused repetitive behavior that is especially prevalent in the younger population of children and adolescents. The behavior produces negative physical and psychological effects on individuals who exhibit onychophagia. Therapy for nail biting involves awareness from the subject which requires constant effort from third parties. This research project utilized a commercial robotic toy in combination with a machine vision system based on image processing to deliver a new strategy to prevent nail biting. The machine vision system recognized nail biting using a webcam in the computer which communicated to the toy robot to alert the subject. The implementation is validated with a user case study obtaining reduction in episode occurrences and duration.

咬指甲症是一种以身体为中心的重复性行为，在年轻的儿童和青少年中尤为普遍。这种行为会对表现出恐甲症的个体产生负面的生理和心理影响。咬指甲的治疗涉及主体的意识，这需要第三方的不断努力。本研究项目利用商用机器人玩具与基于图像处理的机器视觉系统相结合，提供了一种防止咬指甲的新策略。机器视觉系统使用计算机中的网络摄像头识别咬指甲行为，并与玩具机器人通信以提醒受试者。通过用户案例研究验证了该实现，从而减少了事件发生和持续时间。

引用次数: 0

Weakly Supervised Image Classification with Coarse and Fine Labels 带有粗、细标签的弱监督图像分类

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.21

Jie Lei, Zhenyu Guo, Yang Wang

We consider image classification in a weakly supervised scenario where the training data are annotated at different levels of abstractions. A subset of the training data are annotated with coarse labels (e.g. wolf, dog), while the rest of the training data are annotated with fine labels (e.g. breeds of wolves and dogs). Each coarse label corresponds to a superclass of several fine labels. Our goal is to learn a model that can classify a new image into one of the fine classes. We investigate how the coarsely labeled data can help improve the fine label classification. Since it is usually much easier to collect data with coarse labels than those with fine labels, the problem setup considered in this paper can benefit a wide range of real-world applications. We propose a model based on convolutional neural networks (CNNs) to address this problem. We demonstrate the effectiveness of the proposed model on several benchmark datasets. Our model significantly outperforms the naive approach that discards the extra coarsely labeled data.

我们在弱监督场景中考虑图像分类，其中训练数据在不同的抽象级别上进行注释。训练数据的一个子集用粗标签标注(例如狼，狗)，而其余的训练数据用细标签标注(例如狼和狗的品种)。每个粗标签对应于几个细标签的超类。我们的目标是学习一个模型，它可以将新图像分类到一个很好的类别中。我们研究了粗标记数据如何帮助改进精细标签分类。由于使用粗糙标签的数据通常比使用精细标签的数据更容易收集，因此本文中考虑的问题设置可以使广泛的实际应用受益。我们提出了一个基于卷积神经网络(cnn)的模型来解决这个问题。我们在几个基准数据集上证明了所提出模型的有效性。我们的模型明显优于抛弃额外粗糙标记数据的朴素方法。

引用次数: 18

Localizing 3-D Anatomical Landmarks Using Deep Convolutional Neural Networks 基于深度卷积神经网络的三维解剖标志定位

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.11

P. Xi, Chang Shu, R. Goubran

Anatomical landmarks on 3-D human body scans play key roles in shape-essential applications, including consistent parameterization, body measurement extraction, segmentation, and mesh re-targeting. Manually locating landmarks is tedious and time-consuming for large-scale 3-D anthropometric surveys. To automate the landmarking process, we propose a data-driven approach, which learns from landmark locations known on a dataset of 3-D scans and predicts their locations on new scans. More specifically, we adopt a coarse-to-fine approach by training a deep regression neural network to compute the locations of all landmarks and then for each landmark training an individual deep classification neural network to improve its accuracy. In regards to input images being fed into the neural networks, we compute from a frontal view three types of image renderings for comparison, i.e., gray-scale appearance images, range depth images, and curvature mapped images. Among these, curvature mapped images result in the best empirical accuracy from the deep regression network, whereas depth images lead to higher accuracy for locating most landmarks using the deep classification networks. In conclusion, the proposed approach performs better than state of the art on locating most landmarks. The simple yet effective approach can be extended to automatically locate landmarks in large scale 3-D scan datasets.

三维人体扫描上的解剖标志在形状基本应用中起着关键作用，包括一致的参数化、身体测量提取、分割和网格重新定位。对于大规模三维人体测量测量来说，手动定位地标既繁琐又耗时。为了实现地标过程的自动化，我们提出了一种数据驱动的方法，该方法从3d扫描数据集中已知的地标位置学习，并在新的扫描中预测它们的位置。更具体地说，我们采用一种从粗到精的方法，通过训练一个深度回归神经网络来计算所有地标的位置，然后对每个地标训练一个单独的深度分类神经网络来提高其准确性。关于输入到神经网络的图像，我们从正面视图计算三种类型的图像渲染进行比较，即灰度外观图像，范围深度图像和曲率映射图像。其中，曲率映射图像在深度回归网络中具有最佳的经验精度，而深度图像在使用深度分类网络定位大多数地标时具有更高的精度。总之，所提出的方法在定位大多数地标方面比目前的方法表现得更好。这种简单而有效的方法可以扩展到大规模三维扫描数据集中的地标自动定位。

{"title":"Localizing 3-D Anatomical Landmarks Using Deep Convolutional Neural Networks","authors":"P. Xi, Chang Shu, R. Goubran","doi":"10.1109/CRV.2017.11","DOIUrl":"https://doi.org/10.1109/CRV.2017.11","url":null,"abstract":"Anatomical landmarks on 3-D human body scans play key roles in shape-essential applications, including consistent parameterization, body measurement extraction, segmentation, and mesh re-targeting. Manually locating landmarks is tedious and time-consuming for large-scale 3-D anthropometric surveys. To automate the landmarking process, we propose a data-driven approach, which learns from landmark locations known on a dataset of 3-D scans and predicts their locations on new scans. More specifically, we adopt a coarse-to-fine approach by training a deep regression neural network to compute the locations of all landmarks and then for each landmark training an individual deep classification neural network to improve its accuracy. In regards to input images being fed into the neural networks, we compute from a frontal view three types of image renderings for comparison, i.e., gray-scale appearance images, range depth images, and curvature mapped images. Among these, curvature mapped images result in the best empirical accuracy from the deep regression network, whereas depth images lead to higher accuracy for locating most landmarks using the deep classification networks. In conclusion, the proposed approach performs better than state of the art on locating most landmarks. The simple yet effective approach can be extended to automatically locate landmarks in large scale 3-D scan datasets.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Fast Localization of Autonomous Vehicles Using Discriminative Metric Learning 基于判别度量学习的自动驾驶汽车快速定位

2017 14th Conference on Computer and Robot Vision (CRV)

Pub Date : 2017-05-16 DOI: 10.1109/CRV.2017.56

Ankit Pensia, G. Sharma, Gaurav Pandey, J. McBride

In this paper, we report a novel algorithm for localization of autonomous vehicles in an urban environment using orthographic ground reflectivity map created with a three-dimensional (3D) laser scanner. It should be noted that the road paint (lane markings, zebra crossing, traffic signs etc.) constitute the distinctive features in the surface reflectivity map which are generally sparse as compared to the non-interesting asphalt and the off-road portion of the map. Therefore, we propose to project the reflectivity map to a lower dimensional space, that captures the useful features of the map, and then use these projected feature maps for localization. We use discriminative metric learning technique to obtain this lower dimensional space of feature maps. Experimental evaluation of the proposed method on real data shows that it is better than the standard image matching techniques in terms of accuracy. Moreover, the proposed method is computationally fast and can be executed at real-time (10 Hz) on a standard CPU.

在本文中，我们报告了一种新的算法，用于在城市环境中使用三维(3D)激光扫描仪创建的正射影地面反射率地图来定位自动驾驶汽车。需要注意的是，路面涂料(车道标线、斑马线、交通标志等)构成了地表反射率图的鲜明特征，与地图上无趣的沥青和越野部分相比，路面涂料通常是稀疏的。因此，我们建议将反射率图投影到较低维空间，捕获地图的有用特征，然后使用这些投影特征图进行定位。我们使用判别度量学习技术来获得特征映射的低维空间。在实际数据上的实验评价表明，该方法在精度上优于标准图像匹配技术。此外，该方法计算速度快，可以在标准CPU上实时执行(10hz)。

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 14th Conference on Computer and Robot Vision (CRV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀