首页 > 最新文献

2020 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Space-Time Skeletal Analysis with Jointly Dual-Stream ConvNet for Action Recognition 基于联合双流卷积神经网络的时空骨架分析用于动作识别
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363422
Thien Huynh-The, Cam-Hao Hua, Nguyen Anh Tu, Dong-Seong Kim
In this decade, although numerous conventional methods have been introduced for three-dimensional (3D) skeleton-based human action recognition, they have posed a primary limitation of learning a vulnerable recognition model from low-level handcrafted features. This paper proposes an effective deep convolutional neural network (CNN) with a dual-stream architecture to simultaneously learn the geometric-based static pose and dynamic motion features for high-performance action recognition. Each stream consists of several advanced blocks of regular and grouped convolutional layers, wherein various kernel sizes are configured to enrich representational features. Remarkably, the blocks in each stream are associated via skip-connection scheme to overcome the vanishing gradient problem, meanwhile, the blocks of two stream are jointly connected via a customized layer to partly share high-relevant knowledge gained during the model training process. In the experiments, the action recognition method is intensively evaluated on the NTU RGB+D dataset and its upgraded version with up to 120 action classes, where the proposed CNN achieves a competitive performance in terms of accuracy and complexity compared to several other deep models.
在这十年中,尽管已经引入了许多基于三维(3D)骨骼的人体动作识别的传统方法,但它们对从低级手工特征中学习脆弱的识别模型提出了主要限制。本文提出了一种有效的深度卷积神经网络(CNN),它具有双流架构,可以同时学习基于几何的静态姿态和动态运动特征,以实现高性能的动作识别。每个流由几个规则和分组卷积层的高级块组成,其中配置了各种内核大小以丰富代表性特征。值得注意的是,每个流中的块通过跳过连接方案进行关联,克服了梯度消失问题,同时,两个流的块通过自定义层进行联合连接,部分共享了模型训练过程中获得的高相关知识。在实验中,在NTU RGB+D数据集及其多达120个动作类的升级版本上对动作识别方法进行了集中评估,与其他几个深度模型相比,所提出的CNN在准确性和复杂性方面取得了具有竞争力的性能。
{"title":"Space-Time Skeletal Analysis with Jointly Dual-Stream ConvNet for Action Recognition","authors":"Thien Huynh-The, Cam-Hao Hua, Nguyen Anh Tu, Dong-Seong Kim","doi":"10.1109/DICTA51227.2020.9363422","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363422","url":null,"abstract":"In this decade, although numerous conventional methods have been introduced for three-dimensional (3D) skeleton-based human action recognition, they have posed a primary limitation of learning a vulnerable recognition model from low-level handcrafted features. This paper proposes an effective deep convolutional neural network (CNN) with a dual-stream architecture to simultaneously learn the geometric-based static pose and dynamic motion features for high-performance action recognition. Each stream consists of several advanced blocks of regular and grouped convolutional layers, wherein various kernel sizes are configured to enrich representational features. Remarkably, the blocks in each stream are associated via skip-connection scheme to overcome the vanishing gradient problem, meanwhile, the blocks of two stream are jointly connected via a customized layer to partly share high-relevant knowledge gained during the model training process. In the experiments, the action recognition method is intensively evaluated on the NTU RGB+D dataset and its upgraded version with up to 120 action classes, where the proposed CNN achieves a competitive performance in terms of accuracy and complexity compared to several other deep models.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130070574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Brain Tumor Segmentation with Dilated Multi-fiber Network and Weighted Bi-directional Feature Pyramid Network 基于扩展多纤维网络和加权双向特征金字塔网络的脑肿瘤有效分割
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363380
T. Nguyen, Cong Hau Le, D. V. Sang, Tingting Yao, Wei Li, Zhiyong Wang
Brain tumor segmentation is critical for precise diagnosis and personalised treatment of brain cancer. Due to the recent success of deep learning, many deep learning based segmentation methods have been developed. However, most of them are computationally expensive due to complicated network architectures. Recently, multi-fiber networks were proposed to reduce the number of network parameters in U-Net based brain tumor segmentation through efficient graph convolution. However, the efficient use of multi-scale features has not been well explored between contracting and expanding paths except simple concatenation. In this paper, we propose a light-weight network where contracting and expanding paths are connected with fused multi-scale features through bi-directional feature pyramid network (BiFPN). The backbone of our proposed network has a dilated multi-fiber (DMF) structure based U-net architecture. First, conventional convolutional layers along the contracting and expanding paths are replaced with a DMF network and an MF network, respectively, to reduce the overall network size. In addition, a learnable weighted DMF network is utilized to take into account different receptive sizes effectively. Next, a weighted BiFPN is utilized to connect contracting and expanding paths, which enables more effective and efficient information flow between the two paths with multi-scale features. Note that the BiFPN block can be repeated as necessary. As a result, our proposed network is able to further reduce the network size without clearly compromising segmentation accuracy. Experimental results on the popular BraTS 2018 dataset demonstrate that our proposed light-weight architecture is able to achieve at least comparable results with the state-of-the-art methods with significantly reduced network complexity and computation time. The source code of this paper will be available at Github.
脑肿瘤的分割是脑癌精确诊断和个性化治疗的关键。由于近年来深度学习的成功,许多基于深度学习的分割方法被开发出来。然而,由于复杂的网络架构,它们中的大多数在计算上都很昂贵。近年来,为了减少基于U-Net的脑肿瘤分割中网络参数的数量,提出了多光纤网络。然而,除了简单的连接之外,还没有很好地探索收缩和扩展路径之间多尺度特征的有效利用。本文通过双向特征金字塔网络(bibidirectional feature pyramid network, BiFPN)提出了一种将收缩和扩张路径与融合的多尺度特征连接起来的轻量级网络。我们提出的网络骨干网具有基于U-net体系结构的扩展多光纤(DMF)结构。首先,将收缩和扩展路径上的传统卷积层分别替换为DMF网络和MF网络,以减小整体网络的大小。此外,利用可学习的加权DMF网络有效地考虑了不同的接收大小。其次,利用加权BiFPN连接收缩路径和扩展路径,使两条路径之间的信息流动更加有效和高效,具有多尺度特征。注意,如果需要,可以重复使用BiFPN块。因此,我们提出的网络能够在不明显影响分割精度的情况下进一步减小网络大小。在流行的BraTS 2018数据集上的实验结果表明,我们提出的轻量级架构能够达到至少与最先进的方法相当的结果,并且显著降低了网络复杂性和计算时间。本文的源代码可以在Github上获得。
{"title":"Efficient Brain Tumor Segmentation with Dilated Multi-fiber Network and Weighted Bi-directional Feature Pyramid Network","authors":"T. Nguyen, Cong Hau Le, D. V. Sang, Tingting Yao, Wei Li, Zhiyong Wang","doi":"10.1109/DICTA51227.2020.9363380","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363380","url":null,"abstract":"Brain tumor segmentation is critical for precise diagnosis and personalised treatment of brain cancer. Due to the recent success of deep learning, many deep learning based segmentation methods have been developed. However, most of them are computationally expensive due to complicated network architectures. Recently, multi-fiber networks were proposed to reduce the number of network parameters in U-Net based brain tumor segmentation through efficient graph convolution. However, the efficient use of multi-scale features has not been well explored between contracting and expanding paths except simple concatenation. In this paper, we propose a light-weight network where contracting and expanding paths are connected with fused multi-scale features through bi-directional feature pyramid network (BiFPN). The backbone of our proposed network has a dilated multi-fiber (DMF) structure based U-net architecture. First, conventional convolutional layers along the contracting and expanding paths are replaced with a DMF network and an MF network, respectively, to reduce the overall network size. In addition, a learnable weighted DMF network is utilized to take into account different receptive sizes effectively. Next, a weighted BiFPN is utilized to connect contracting and expanding paths, which enables more effective and efficient information flow between the two paths with multi-scale features. Note that the BiFPN block can be repeated as necessary. As a result, our proposed network is able to further reduce the network size without clearly compromising segmentation accuracy. Experimental results on the popular BraTS 2018 dataset demonstrate that our proposed light-weight architecture is able to achieve at least comparable results with the state-of-the-art methods with significantly reduced network complexity and computation time. The source code of this paper will be available at Github.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122419978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Moving object detection for humanoid navigation in cluttered dynamic indoor environments using a confidence tracking approach 基于置信度跟踪方法的杂乱动态室内环境中仿人导航运动目标检测
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363413
Prabin Kumar Rath, A. Ramirez-Serrano, D. K. Pratihar
Humanoid robot perception is challenging compared to perception in other robotic systems. The sensors in a humanoid are in constant state of motion and their pose estimation is affected by the constant motion of the tens of DOFs (Degrees of Freedom) which in turn affect the estimation of the sensed environmental objects. This is especially problematic in highly cluttered dynamic spaces such as indoor office environments. One of the challenges is identifying the presence of all independent moving/dynamic entities such as people walking around the robot. If available, such information would assist humanoids to build better maps and better plan their motions in unstructured confined dynamic environments. This paper presents a moving object detection pipeline based on relative motion and a novel confidence tracking approach that detects point clusters corresponding to independent moving entities around the robot. The detection does not depend on prior knowledge about the target entity. A ground plane removal tool based on voxel grid covariance is used for separating point clusters of objects within the environment. The proposed method was tested using a Velodyne VLP-16 LiDAR and an Intel-T265 IMU mounted on a gimbal-stabilized humanoid head. The experiments show promising results with a real-time computational time complexity.
人形机器人的感知与其他机器人系统的感知相比具有挑战性。类人机器人中的传感器处于恒定的运动状态,其姿态估计受到其数十个自由度的恒定运动的影响,进而影响对被感测环境物体的估计。这在高度杂乱的动态空间(如室内办公环境)中尤其成问题。其中一个挑战是识别所有独立移动/动态实体的存在,例如在机器人周围行走的人。如果可以获得,这些信息将有助于类人机器人绘制更好的地图,更好地规划它们在非结构化受限动态环境中的运动。提出了一种基于相对运动的运动目标检测管道和一种新的置信度跟踪方法,该方法检测机器人周围独立运动实体对应的点簇。检测不依赖于对目标实体的先验知识。基于体素网格协方差的地平面去除工具用于分离环境中物体的点簇。采用Velodyne VLP-16激光雷达和安装在云台稳定的人形头部的Intel-T265 IMU对所提出的方法进行了测试。实验结果表明,该方法具有较好的实时计算时间复杂度。
{"title":"Moving object detection for humanoid navigation in cluttered dynamic indoor environments using a confidence tracking approach","authors":"Prabin Kumar Rath, A. Ramirez-Serrano, D. K. Pratihar","doi":"10.1109/DICTA51227.2020.9363413","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363413","url":null,"abstract":"Humanoid robot perception is challenging compared to perception in other robotic systems. The sensors in a humanoid are in constant state of motion and their pose estimation is affected by the constant motion of the tens of DOFs (Degrees of Freedom) which in turn affect the estimation of the sensed environmental objects. This is especially problematic in highly cluttered dynamic spaces such as indoor office environments. One of the challenges is identifying the presence of all independent moving/dynamic entities such as people walking around the robot. If available, such information would assist humanoids to build better maps and better plan their motions in unstructured confined dynamic environments. This paper presents a moving object detection pipeline based on relative motion and a novel confidence tracking approach that detects point clusters corresponding to independent moving entities around the robot. The detection does not depend on prior knowledge about the target entity. A ground plane removal tool based on voxel grid covariance is used for separating point clusters of objects within the environment. The proposed method was tested using a Velodyne VLP-16 LiDAR and an Intel-T265 IMU mounted on a gimbal-stabilized humanoid head. The experiments show promising results with a real-time computational time complexity.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126943709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Fingerprint Authentication with Homomorphic Encryption 基于同态加密的安全指纹认证
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363426
Wencheng Yang, Song Wang, Kan Yu, James Jin Kang, Michael N. Johnstone
Biometric-based authentication has come into recent prevalence in competition to traditional password- and/or token-based authentication in many applications, both for user convenience and the stability/uniqueness of biometric traits. However, biometric template data, uniquely linking to a user's identity, are considered to be sensitive information. Therefore, it should be secured to prevent privacy leakage. In this paper, we propose a homomorphic encryption-based fingerprint authentication system to provide access control, while protecting sensitive biometric template data. Using homomorphic encryption, matching of biometric data can be performed in the encrypted domain, increasing the difficulty for attackers to obtain the original biometric template without knowing the private key. Moreover, the trade-off between the computational overload and authentication accuracy is studied and experimentally verified on a publicly available fingerprint database, FVC2002 DB2.
近年来,基于生物特征的身份验证在许多应用中与传统的基于密码和/或令牌的身份验证相竞争,这既是为了用户的便利性,也是为了生物特征的稳定性/唯一性。然而,生物识别模板数据,唯一链接到用户的身份,被认为是敏感信息。因此,应该对其进行安全保护,防止隐私泄露。在本文中,我们提出了一种基于同态加密的指纹认证系统,在提供访问控制的同时保护敏感的生物特征模板数据。利用同态加密,可以在加密域内对生物特征数据进行匹配,增加了攻击者在不知道私钥的情况下获取原始生物特征模板的难度。此外,还研究了计算过载和身份验证准确性之间的权衡,并在一个公开可用的指纹数据库FVC2002 DB2上进行了实验验证。
{"title":"Secure Fingerprint Authentication with Homomorphic Encryption","authors":"Wencheng Yang, Song Wang, Kan Yu, James Jin Kang, Michael N. Johnstone","doi":"10.1109/DICTA51227.2020.9363426","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363426","url":null,"abstract":"Biometric-based authentication has come into recent prevalence in competition to traditional password- and/or token-based authentication in many applications, both for user convenience and the stability/uniqueness of biometric traits. However, biometric template data, uniquely linking to a user's identity, are considered to be sensitive information. Therefore, it should be secured to prevent privacy leakage. In this paper, we propose a homomorphic encryption-based fingerprint authentication system to provide access control, while protecting sensitive biometric template data. Using homomorphic encryption, matching of biometric data can be performed in the encrypted domain, increasing the difficulty for attackers to obtain the original biometric template without knowing the private key. Moreover, the trade-off between the computational overload and authentication accuracy is studied and experimentally verified on a publicly available fingerprint database, FVC2002 DB2.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130707903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
W-A net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection W-A网络:利用自然和可变形卷积进行有效的文本检测
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363428
Sukhad Anand, Z. Khan
Scene text detection has been gaining a lots of focus in research. Even though the recent methods are able to detect text in complex background having complex shapes with a fairly good accuracy, they still suffer from issues of limited receptive field. These fail from detecting extremely short or long words hence failing in detecting text words precisely in document text images. We propose a new model which we call W-A net, because of it's W shape with the middle branch being Atrous convolutional layers. Our model predicts a segmentation map which divides the image into word and no word regions and also, a boundary map which helps to segregate closer words from each other. We use Atrous convolutions and Deformable convolutional layers to increase the receptive field which helps to detect long words in an image. We treat text detection problem as a single problem irrespective of the background, making our model suitable of detecting text in scene or document images. We present our findings on two scene text datasets and a receipt dataset. Our results show that our method performs better than recent scene text detection methods which perform poorly on document text images, especially receipt images with short words.
场景文本检测一直是研究的热点。尽管目前的方法能够较好地检测复杂背景、复杂形状的文本,但仍然存在接受域有限的问题。这些方法无法检测到极短或极长的单词,从而无法准确地检测到文档文本图像中的文本单词。我们提出了一个新的模型,我们称之为W- a网络,因为它是W形的,中间分支是阿特罗斯卷积层。我们的模型预测了一个分割图,它将图像划分为有词和无词区域,还有一个边界图,它有助于分离彼此之间更接近的词。我们使用亚特罗斯卷积和可变形卷积层来增加接收场,这有助于检测图像中的长单词。我们将文本检测问题视为一个单独的问题,而不考虑背景,使我们的模型适合于检测场景或文档图像中的文本。我们在两个场景文本数据集和一个收据数据集上展示了我们的发现。我们的结果表明,我们的方法优于当前的场景文本检测方法,这些方法在文档文本图像,特别是带有短单词的收据图像上表现不佳。
{"title":"W-A net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection","authors":"Sukhad Anand, Z. Khan","doi":"10.1109/DICTA51227.2020.9363428","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363428","url":null,"abstract":"Scene text detection has been gaining a lots of focus in research. Even though the recent methods are able to detect text in complex background having complex shapes with a fairly good accuracy, they still suffer from issues of limited receptive field. These fail from detecting extremely short or long words hence failing in detecting text words precisely in document text images. We propose a new model which we call W-A net, because of it's W shape with the middle branch being Atrous convolutional layers. Our model predicts a segmentation map which divides the image into word and no word regions and also, a boundary map which helps to segregate closer words from each other. We use Atrous convolutions and Deformable convolutional layers to increase the receptive field which helps to detect long words in an image. We treat text detection problem as a single problem irrespective of the background, making our model suitable of detecting text in scene or document images. We present our findings on two scene text datasets and a receipt dataset. Our results show that our method performs better than recent scene text detection methods which perform poorly on document text images, especially receipt images with short words.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124874217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Reconstruction and Object Detection for HoloLens HoloLens的三维重建和目标检测
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363378
Zequn Wu, Tianhao Zhao, Chuong V. Nguyen
Current smart glasses such as HoloLens excel at positioning within the physical environment, however object and task recognition are still relatively primitive. We aim to expand the available benefits of MR/AR systems by using semantic object recognition and 3D reconstruction. Particularly in this preliminary study, we successfully use a HoloLens to build 3D maps, recognise and count objects in a working environment. This is achieved by offloading these computationally expensive tasks to a remote GPU server. To further achieve realtime feedback and parallelise tasks, object detection is performed on 2D images and mapped to 3D reconstructed space. Fusion of multiple views of 2D detection is additionally performed to refine 3D object bounding boxes and separate nearby objects.
目前的智能眼镜如HoloLens擅长在物理环境中定位,但物体和任务识别仍然相对原始。我们的目标是通过使用语义对象识别和3D重建来扩大MR/AR系统的可用优势。特别是在这项初步研究中,我们成功地使用HoloLens来构建3D地图,识别和计数工作环境中的物体。这是通过将这些计算成本高昂的任务卸载到远程GPU服务器来实现的。为了进一步实现实时反馈和并行任务,在二维图像上进行目标检测,并将其映射到三维重构空间。此外,还对2D检测的多个视图进行融合,以细化3D对象边界框并分离附近的对象。
{"title":"3D Reconstruction and Object Detection for HoloLens","authors":"Zequn Wu, Tianhao Zhao, Chuong V. Nguyen","doi":"10.1109/DICTA51227.2020.9363378","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363378","url":null,"abstract":"Current smart glasses such as HoloLens excel at positioning within the physical environment, however object and task recognition are still relatively primitive. We aim to expand the available benefits of MR/AR systems by using semantic object recognition and 3D reconstruction. Particularly in this preliminary study, we successfully use a HoloLens to build 3D maps, recognise and count objects in a working environment. This is achieved by offloading these computationally expensive tasks to a remote GPU server. To further achieve realtime feedback and parallelise tasks, object detection is performed on 2D images and mapped to 3D reconstructed space. Fusion of multiple views of 2D detection is additionally performed to refine 3D object bounding boxes and separate nearby objects.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121953822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
PlaneCalib: Automatic Camera Calibration by Multiple Observations of Rigid Objects on Plane PlaneCalib:通过对平面上刚体的多次观测来自动标定相机
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363417
Vojtech Bartl, Roman Juránek, Jakub Špaňhel, A. Herout
In this work, we propose a novel method for automatic camera calibration, mainly for surveillance cameras. The calibration consists in observing objects on the ground plane of the scene; in our experiments, vehicles were used. However, any arbitrary rigid objects can be used instead, as verified by experiments with synthetic data. The calibration process uses convolutional neural network localisation of landmarks on the observed objects in the scene and the corresponding 3D positions of the localised landmarks � thus fine-grained classification of the detected vehicles in the image plane is done. The observation of the objects (detection, classification and landmark detection) enables to determine all typically used camera calibration parameters (focal length, rotation matrix, and translation vector). The experiments with real data show slightly better results in comparison with state-of-the-art work, however with an extreme speed-up. The calibration error decreased from 3.01% to 2.72% and 1223 � faster computation was achieved.
在这项工作中,我们提出了一种新的自动摄像机校准方法,主要用于监控摄像机。标定包括观测场景地平面上的物体;在我们的实验中,车辆被使用。然而,任何任意的刚性物体都可以代替,正如用合成数据进行的实验所证实的那样。校准过程使用卷积神经网络对场景中观察到的物体上的地标进行定位,以及定位后地标的相应3D位置,从而在图像平面上对检测到的车辆进行细粒度分类。对物体的观察(检测、分类和地标检测)可以确定所有常用的相机校准参数(焦距、旋转矩阵和平移向量)。与最先进的工作相比,用真实数据进行的实验显示出稍好的结果,但速度有极大的提高。校正误差从3.01%降低到2.72%,计算速度提高了1223℃。
{"title":"PlaneCalib: Automatic Camera Calibration by Multiple Observations of Rigid Objects on Plane","authors":"Vojtech Bartl, Roman Juránek, Jakub Špaňhel, A. Herout","doi":"10.1109/DICTA51227.2020.9363417","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363417","url":null,"abstract":"In this work, we propose a novel method for automatic camera calibration, mainly for surveillance cameras. The calibration consists in observing objects on the ground plane of the scene; in our experiments, vehicles were used. However, any arbitrary rigid objects can be used instead, as verified by experiments with synthetic data. The calibration process uses convolutional neural network localisation of landmarks on the observed objects in the scene and the corresponding 3D positions of the localised landmarks � thus fine-grained classification of the detected vehicles in the image plane is done. The observation of the objects (detection, classification and landmark detection) enables to determine all typically used camera calibration parameters (focal length, rotation matrix, and translation vector). The experiments with real data show slightly better results in comparison with state-of-the-art work, however with an extreme speed-up. The calibration error decreased from 3.01% to 2.72% and 1223 � faster computation was achieved.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126807944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Survey on Training Free 3D Texture-less Object Recognition Techniques 训练免费三维无纹理物体识别技术综述
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363389
Piyush Joshi, Alireza Rastegarpanah, R. Stolkin
Local surface feature based 3D object recognition is a rapidly growing research field. In time-critical applications such as robotics, training free recognition techniques are always the first choice as they are free from heavy statistical training. This paper presents an experimental analysis of 3D texture-less object recognition techniques that are free from any training. To our best knowledge, this is the first survey that includes experimental evaluation of top-rated training free recognition techniques on the datasets acquired by an RGBD camera. Based on the experimentation, we briefly present a discussion on potential future research directions.
基于局部表面特征的三维目标识别是一个快速发展的研究领域。在诸如机器人技术等时间紧迫的应用中,无需训练的识别技术总是首选,因为它们不需要繁重的统计训练。本文提出了一种无需任何训练的三维无纹理目标识别技术的实验分析。据我们所知,这是第一次在RGBD相机获得的数据集上对顶级免费训练识别技术进行实验评估的调查。在实验的基础上,对未来可能的研究方向进行了简要的讨论。
{"title":"A Survey on Training Free 3D Texture-less Object Recognition Techniques","authors":"Piyush Joshi, Alireza Rastegarpanah, R. Stolkin","doi":"10.1109/DICTA51227.2020.9363389","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363389","url":null,"abstract":"Local surface feature based 3D object recognition is a rapidly growing research field. In time-critical applications such as robotics, training free recognition techniques are always the first choice as they are free from heavy statistical training. This paper presents an experimental analysis of 3D texture-less object recognition techniques that are free from any training. To our best knowledge, this is the first survey that includes experimental evaluation of top-rated training free recognition techniques on the datasets acquired by an RGBD camera. Based on the experimentation, we briefly present a discussion on potential future research directions.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An improved method for pylon extraction and vegetation encroachment analysis in high voltage transmission lines using LiDAR data 基于激光雷达数据的高压输电线路塔架提取与植被侵占分析改进方法
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363391
Nosheen Munir, M. Awrangjeb, Bela Stantic
The maintenance of high-voltage power lines rights-of-way due to vegetation intrusions is important for electric power distribution companies for safe and secure delivery of electricity. However, the monitoring becomes more challenging if power line corridor (PLC) exists in complex environment such as mountainous terrains or forests. To overcome these challenges, this paper aims to provide an automated method for extraction of individual pylons and monitoring of vegetation near the PLC in hilly terrain. The proposed method starts off by dividing the large dataset into small manageable datasets. A voxel grid is formed on each dataset to separate power lines from pylons and vegetation. The power line points are converted into a binary image to get the individual spans. These span points are used to find nearby vegetation and pylon points and individual pylons and vegetation are further separated using a statistical analysis. Finally, the height and location of extracted vegetation with reference to power lines are estimated and separated into danger and clearance zones. The experiment on two large Australian datasets shows that the proposed method provides high completeness and correctness of 96.5% and 99% for pylons, respectively. Moreover, the growing vegetation beneath and around the PLC that can harm the power lines is identified.
由于植被的侵入,维护高压电线的通行权对于配电公司安全可靠地输送电力非常重要。然而,当电力线走廊(PLC)存在于山地或森林等复杂环境中时,其监测就变得更加困难。为了克服这些挑战,本文旨在提供一种自动化的方法来提取单个塔和监测丘陵地形PLC附近的植被。该方法首先将大数据集划分为可管理的小数据集。在每个数据集上形成一个体素网格,将电力线与塔和植被分开。将电力线点转换成二值图像,得到各个点的跨度。这些跨度点用于寻找附近的植被和塔点,并使用统计分析进一步分离单个塔和植被。最后,根据电力线估算提取植被的高度和位置,并将其划分为危险区和清除区。在澳大利亚两个大型数据集上的实验表明,该方法对塔架的完整性和正确性分别达到了96.5%和99%。此外,还确定了PLC下面和周围生长的植被,这些植被可能会损害电力线。
{"title":"An improved method for pylon extraction and vegetation encroachment analysis in high voltage transmission lines using LiDAR data","authors":"Nosheen Munir, M. Awrangjeb, Bela Stantic","doi":"10.1109/DICTA51227.2020.9363391","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363391","url":null,"abstract":"The maintenance of high-voltage power lines rights-of-way due to vegetation intrusions is important for electric power distribution companies for safe and secure delivery of electricity. However, the monitoring becomes more challenging if power line corridor (PLC) exists in complex environment such as mountainous terrains or forests. To overcome these challenges, this paper aims to provide an automated method for extraction of individual pylons and monitoring of vegetation near the PLC in hilly terrain. The proposed method starts off by dividing the large dataset into small manageable datasets. A voxel grid is formed on each dataset to separate power lines from pylons and vegetation. The power line points are converted into a binary image to get the individual spans. These span points are used to find nearby vegetation and pylon points and individual pylons and vegetation are further separated using a statistical analysis. Finally, the height and location of extracted vegetation with reference to power lines are estimated and separated into danger and clearance zones. The experiment on two large Australian datasets shows that the proposed method provides high completeness and correctness of 96.5% and 99% for pylons, respectively. Moreover, the growing vegetation beneath and around the PLC that can harm the power lines is identified.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124847170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Learning Affordance Segmentation: An Investigative Study 学习能力分割:一项调查研究
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363390
Chau D. M. Nguyen, S. Z. Gilani, S. Islam, D. Suter
Affordance segmentation aims at recognising, localising and segmenting affordances from images, enabling scene understanding of visual content in many applications in robotic perception. Supervised learning with deep networks has become very popular in affordance segmentation. However, very few studies have investigated the factors that contribute to improved learning of affordances. This investigation is essential to improve precision and balance cost-efficiency when learning affordance segmentation. In this paper, we address this task and identify two prime factors affecting precision of learning affordance segmentation: (1) The quality of features extracted from the classification module and (2) the dearth of information in the Region Proposal Network (RPN). Consequently, we replace the backbone classification model and introduce a novel multiple alignment strategy in the RPN. Our results obtained through extensive experimentation validate our contributions and outperform the state-of-the-art affordance segmentation models.
可视性分割旨在识别、定位和分割图像中的可视性,从而在机器人感知的许多应用中实现对视觉内容的场景理解。深度网络的监督学习在功能分割中非常流行。然而,很少有研究调查了有助于提高可视性学习的因素。这一研究对于提高学习可视性分割的准确性和平衡成本效率至关重要。在本文中,我们解决了这一问题,并确定了影响学习能力分割精度的两个主要因素:(1)从分类模块中提取的特征质量;(2)区域建议网络(RPN)中信息的缺乏。因此,我们取代了骨干分类模型,并在RPN中引入了一种新的多对齐策略。我们通过广泛的实验获得的结果验证了我们的贡献,并且优于最先进的功能分割模型。
{"title":"Learning Affordance Segmentation: An Investigative Study","authors":"Chau D. M. Nguyen, S. Z. Gilani, S. Islam, D. Suter","doi":"10.1109/DICTA51227.2020.9363390","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363390","url":null,"abstract":"Affordance segmentation aims at recognising, localising and segmenting affordances from images, enabling scene understanding of visual content in many applications in robotic perception. Supervised learning with deep networks has become very popular in affordance segmentation. However, very few studies have investigated the factors that contribute to improved learning of affordances. This investigation is essential to improve precision and balance cost-efficiency when learning affordance segmentation. In this paper, we address this task and identify two prime factors affecting precision of learning affordance segmentation: (1) The quality of features extracted from the classification module and (2) the dearth of information in the Region Proposal Network (RPN). Consequently, we replace the backbone classification model and introduce a novel multiple alignment strategy in the RPN. Our results obtained through extensive experimentation validate our contributions and outperform the state-of-the-art affordance segmentation models.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2020 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1