首页 > 最新文献

2020 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Space-Time Skeletal Analysis with Jointly Dual-Stream ConvNet for Action Recognition 基于联合双流卷积神经网络的时空骨架分析用于动作识别
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363422
Thien Huynh-The, Cam-Hao Hua, Nguyen Anh Tu, Dong-Seong Kim
In this decade, although numerous conventional methods have been introduced for three-dimensional (3D) skeleton-based human action recognition, they have posed a primary limitation of learning a vulnerable recognition model from low-level handcrafted features. This paper proposes an effective deep convolutional neural network (CNN) with a dual-stream architecture to simultaneously learn the geometric-based static pose and dynamic motion features for high-performance action recognition. Each stream consists of several advanced blocks of regular and grouped convolutional layers, wherein various kernel sizes are configured to enrich representational features. Remarkably, the blocks in each stream are associated via skip-connection scheme to overcome the vanishing gradient problem, meanwhile, the blocks of two stream are jointly connected via a customized layer to partly share high-relevant knowledge gained during the model training process. In the experiments, the action recognition method is intensively evaluated on the NTU RGB+D dataset and its upgraded version with up to 120 action classes, where the proposed CNN achieves a competitive performance in terms of accuracy and complexity compared to several other deep models.
在这十年中,尽管已经引入了许多基于三维(3D)骨骼的人体动作识别的传统方法,但它们对从低级手工特征中学习脆弱的识别模型提出了主要限制。本文提出了一种有效的深度卷积神经网络(CNN),它具有双流架构,可以同时学习基于几何的静态姿态和动态运动特征,以实现高性能的动作识别。每个流由几个规则和分组卷积层的高级块组成,其中配置了各种内核大小以丰富代表性特征。值得注意的是,每个流中的块通过跳过连接方案进行关联,克服了梯度消失问题,同时,两个流的块通过自定义层进行联合连接,部分共享了模型训练过程中获得的高相关知识。在实验中,在NTU RGB+D数据集及其多达120个动作类的升级版本上对动作识别方法进行了集中评估,与其他几个深度模型相比,所提出的CNN在准确性和复杂性方面取得了具有竞争力的性能。
{"title":"Space-Time Skeletal Analysis with Jointly Dual-Stream ConvNet for Action Recognition","authors":"Thien Huynh-The, Cam-Hao Hua, Nguyen Anh Tu, Dong-Seong Kim","doi":"10.1109/DICTA51227.2020.9363422","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363422","url":null,"abstract":"In this decade, although numerous conventional methods have been introduced for three-dimensional (3D) skeleton-based human action recognition, they have posed a primary limitation of learning a vulnerable recognition model from low-level handcrafted features. This paper proposes an effective deep convolutional neural network (CNN) with a dual-stream architecture to simultaneously learn the geometric-based static pose and dynamic motion features for high-performance action recognition. Each stream consists of several advanced blocks of regular and grouped convolutional layers, wherein various kernel sizes are configured to enrich representational features. Remarkably, the blocks in each stream are associated via skip-connection scheme to overcome the vanishing gradient problem, meanwhile, the blocks of two stream are jointly connected via a customized layer to partly share high-relevant knowledge gained during the model training process. In the experiments, the action recognition method is intensively evaluated on the NTU RGB+D dataset and its upgraded version with up to 120 action classes, where the proposed CNN achieves a competitive performance in terms of accuracy and complexity compared to several other deep models.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130070574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Brain Tumor Segmentation with Dilated Multi-fiber Network and Weighted Bi-directional Feature Pyramid Network 基于扩展多纤维网络和加权双向特征金字塔网络的脑肿瘤有效分割
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363380
T. Nguyen, Cong Hau Le, D. V. Sang, Tingting Yao, Wei Li, Zhiyong Wang
Brain tumor segmentation is critical for precise diagnosis and personalised treatment of brain cancer. Due to the recent success of deep learning, many deep learning based segmentation methods have been developed. However, most of them are computationally expensive due to complicated network architectures. Recently, multi-fiber networks were proposed to reduce the number of network parameters in U-Net based brain tumor segmentation through efficient graph convolution. However, the efficient use of multi-scale features has not been well explored between contracting and expanding paths except simple concatenation. In this paper, we propose a light-weight network where contracting and expanding paths are connected with fused multi-scale features through bi-directional feature pyramid network (BiFPN). The backbone of our proposed network has a dilated multi-fiber (DMF) structure based U-net architecture. First, conventional convolutional layers along the contracting and expanding paths are replaced with a DMF network and an MF network, respectively, to reduce the overall network size. In addition, a learnable weighted DMF network is utilized to take into account different receptive sizes effectively. Next, a weighted BiFPN is utilized to connect contracting and expanding paths, which enables more effective and efficient information flow between the two paths with multi-scale features. Note that the BiFPN block can be repeated as necessary. As a result, our proposed network is able to further reduce the network size without clearly compromising segmentation accuracy. Experimental results on the popular BraTS 2018 dataset demonstrate that our proposed light-weight architecture is able to achieve at least comparable results with the state-of-the-art methods with significantly reduced network complexity and computation time. The source code of this paper will be available at Github.
脑肿瘤的分割是脑癌精确诊断和个性化治疗的关键。由于近年来深度学习的成功,许多基于深度学习的分割方法被开发出来。然而,由于复杂的网络架构,它们中的大多数在计算上都很昂贵。近年来,为了减少基于U-Net的脑肿瘤分割中网络参数的数量,提出了多光纤网络。然而,除了简单的连接之外,还没有很好地探索收缩和扩展路径之间多尺度特征的有效利用。本文通过双向特征金字塔网络(bibidirectional feature pyramid network, BiFPN)提出了一种将收缩和扩张路径与融合的多尺度特征连接起来的轻量级网络。我们提出的网络骨干网具有基于U-net体系结构的扩展多光纤(DMF)结构。首先,将收缩和扩展路径上的传统卷积层分别替换为DMF网络和MF网络,以减小整体网络的大小。此外,利用可学习的加权DMF网络有效地考虑了不同的接收大小。其次,利用加权BiFPN连接收缩路径和扩展路径,使两条路径之间的信息流动更加有效和高效,具有多尺度特征。注意,如果需要,可以重复使用BiFPN块。因此,我们提出的网络能够在不明显影响分割精度的情况下进一步减小网络大小。在流行的BraTS 2018数据集上的实验结果表明,我们提出的轻量级架构能够达到至少与最先进的方法相当的结果,并且显著降低了网络复杂性和计算时间。本文的源代码可以在Github上获得。
{"title":"Efficient Brain Tumor Segmentation with Dilated Multi-fiber Network and Weighted Bi-directional Feature Pyramid Network","authors":"T. Nguyen, Cong Hau Le, D. V. Sang, Tingting Yao, Wei Li, Zhiyong Wang","doi":"10.1109/DICTA51227.2020.9363380","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363380","url":null,"abstract":"Brain tumor segmentation is critical for precise diagnosis and personalised treatment of brain cancer. Due to the recent success of deep learning, many deep learning based segmentation methods have been developed. However, most of them are computationally expensive due to complicated network architectures. Recently, multi-fiber networks were proposed to reduce the number of network parameters in U-Net based brain tumor segmentation through efficient graph convolution. However, the efficient use of multi-scale features has not been well explored between contracting and expanding paths except simple concatenation. In this paper, we propose a light-weight network where contracting and expanding paths are connected with fused multi-scale features through bi-directional feature pyramid network (BiFPN). The backbone of our proposed network has a dilated multi-fiber (DMF) structure based U-net architecture. First, conventional convolutional layers along the contracting and expanding paths are replaced with a DMF network and an MF network, respectively, to reduce the overall network size. In addition, a learnable weighted DMF network is utilized to take into account different receptive sizes effectively. Next, a weighted BiFPN is utilized to connect contracting and expanding paths, which enables more effective and efficient information flow between the two paths with multi-scale features. Note that the BiFPN block can be repeated as necessary. As a result, our proposed network is able to further reduce the network size without clearly compromising segmentation accuracy. Experimental results on the popular BraTS 2018 dataset demonstrate that our proposed light-weight architecture is able to achieve at least comparable results with the state-of-the-art methods with significantly reduced network complexity and computation time. The source code of this paper will be available at Github.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122419978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Moving object detection for humanoid navigation in cluttered dynamic indoor environments using a confidence tracking approach 基于置信度跟踪方法的杂乱动态室内环境中仿人导航运动目标检测
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363413
Prabin Kumar Rath, A. Ramirez-Serrano, D. K. Pratihar
Humanoid robot perception is challenging compared to perception in other robotic systems. The sensors in a humanoid are in constant state of motion and their pose estimation is affected by the constant motion of the tens of DOFs (Degrees of Freedom) which in turn affect the estimation of the sensed environmental objects. This is especially problematic in highly cluttered dynamic spaces such as indoor office environments. One of the challenges is identifying the presence of all independent moving/dynamic entities such as people walking around the robot. If available, such information would assist humanoids to build better maps and better plan their motions in unstructured confined dynamic environments. This paper presents a moving object detection pipeline based on relative motion and a novel confidence tracking approach that detects point clusters corresponding to independent moving entities around the robot. The detection does not depend on prior knowledge about the target entity. A ground plane removal tool based on voxel grid covariance is used for separating point clusters of objects within the environment. The proposed method was tested using a Velodyne VLP-16 LiDAR and an Intel-T265 IMU mounted on a gimbal-stabilized humanoid head. The experiments show promising results with a real-time computational time complexity.
人形机器人的感知与其他机器人系统的感知相比具有挑战性。类人机器人中的传感器处于恒定的运动状态,其姿态估计受到其数十个自由度的恒定运动的影响,进而影响对被感测环境物体的估计。这在高度杂乱的动态空间(如室内办公环境)中尤其成问题。其中一个挑战是识别所有独立移动/动态实体的存在,例如在机器人周围行走的人。如果可以获得,这些信息将有助于类人机器人绘制更好的地图,更好地规划它们在非结构化受限动态环境中的运动。提出了一种基于相对运动的运动目标检测管道和一种新的置信度跟踪方法,该方法检测机器人周围独立运动实体对应的点簇。检测不依赖于对目标实体的先验知识。基于体素网格协方差的地平面去除工具用于分离环境中物体的点簇。采用Velodyne VLP-16激光雷达和安装在云台稳定的人形头部的Intel-T265 IMU对所提出的方法进行了测试。实验结果表明,该方法具有较好的实时计算时间复杂度。
{"title":"Moving object detection for humanoid navigation in cluttered dynamic indoor environments using a confidence tracking approach","authors":"Prabin Kumar Rath, A. Ramirez-Serrano, D. K. Pratihar","doi":"10.1109/DICTA51227.2020.9363413","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363413","url":null,"abstract":"Humanoid robot perception is challenging compared to perception in other robotic systems. The sensors in a humanoid are in constant state of motion and their pose estimation is affected by the constant motion of the tens of DOFs (Degrees of Freedom) which in turn affect the estimation of the sensed environmental objects. This is especially problematic in highly cluttered dynamic spaces such as indoor office environments. One of the challenges is identifying the presence of all independent moving/dynamic entities such as people walking around the robot. If available, such information would assist humanoids to build better maps and better plan their motions in unstructured confined dynamic environments. This paper presents a moving object detection pipeline based on relative motion and a novel confidence tracking approach that detects point clusters corresponding to independent moving entities around the robot. The detection does not depend on prior knowledge about the target entity. A ground plane removal tool based on voxel grid covariance is used for separating point clusters of objects within the environment. The proposed method was tested using a Velodyne VLP-16 LiDAR and an Intel-T265 IMU mounted on a gimbal-stabilized humanoid head. The experiments show promising results with a real-time computational time complexity.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126943709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Fingerprint Authentication with Homomorphic Encryption 基于同态加密的安全指纹认证
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363426
Wencheng Yang, Song Wang, Kan Yu, James Jin Kang, Michael N. Johnstone
Biometric-based authentication has come into recent prevalence in competition to traditional password- and/or token-based authentication in many applications, both for user convenience and the stability/uniqueness of biometric traits. However, biometric template data, uniquely linking to a user's identity, are considered to be sensitive information. Therefore, it should be secured to prevent privacy leakage. In this paper, we propose a homomorphic encryption-based fingerprint authentication system to provide access control, while protecting sensitive biometric template data. Using homomorphic encryption, matching of biometric data can be performed in the encrypted domain, increasing the difficulty for attackers to obtain the original biometric template without knowing the private key. Moreover, the trade-off between the computational overload and authentication accuracy is studied and experimentally verified on a publicly available fingerprint database, FVC2002 DB2.
近年来,基于生物特征的身份验证在许多应用中与传统的基于密码和/或令牌的身份验证相竞争,这既是为了用户的便利性,也是为了生物特征的稳定性/唯一性。然而,生物识别模板数据,唯一链接到用户的身份,被认为是敏感信息。因此,应该对其进行安全保护,防止隐私泄露。在本文中,我们提出了一种基于同态加密的指纹认证系统,在提供访问控制的同时保护敏感的生物特征模板数据。利用同态加密,可以在加密域内对生物特征数据进行匹配,增加了攻击者在不知道私钥的情况下获取原始生物特征模板的难度。此外,还研究了计算过载和身份验证准确性之间的权衡,并在一个公开可用的指纹数据库FVC2002 DB2上进行了实验验证。
{"title":"Secure Fingerprint Authentication with Homomorphic Encryption","authors":"Wencheng Yang, Song Wang, Kan Yu, James Jin Kang, Michael N. Johnstone","doi":"10.1109/DICTA51227.2020.9363426","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363426","url":null,"abstract":"Biometric-based authentication has come into recent prevalence in competition to traditional password- and/or token-based authentication in many applications, both for user convenience and the stability/uniqueness of biometric traits. However, biometric template data, uniquely linking to a user's identity, are considered to be sensitive information. Therefore, it should be secured to prevent privacy leakage. In this paper, we propose a homomorphic encryption-based fingerprint authentication system to provide access control, while protecting sensitive biometric template data. Using homomorphic encryption, matching of biometric data can be performed in the encrypted domain, increasing the difficulty for attackers to obtain the original biometric template without knowing the private key. Moreover, the trade-off between the computational overload and authentication accuracy is studied and experimentally verified on a publicly available fingerprint database, FVC2002 DB2.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130707903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
W-A net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection W-A网络:利用自然和可变形卷积进行有效的文本检测
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363428
Sukhad Anand, Z. Khan
Scene text detection has been gaining a lots of focus in research. Even though the recent methods are able to detect text in complex background having complex shapes with a fairly good accuracy, they still suffer from issues of limited receptive field. These fail from detecting extremely short or long words hence failing in detecting text words precisely in document text images. We propose a new model which we call W-A net, because of it's W shape with the middle branch being Atrous convolutional layers. Our model predicts a segmentation map which divides the image into word and no word regions and also, a boundary map which helps to segregate closer words from each other. We use Atrous convolutions and Deformable convolutional layers to increase the receptive field which helps to detect long words in an image. We treat text detection problem as a single problem irrespective of the background, making our model suitable of detecting text in scene or document images. We present our findings on two scene text datasets and a receipt dataset. Our results show that our method performs better than recent scene text detection methods which perform poorly on document text images, especially receipt images with short words.
场景文本检测一直是研究的热点。尽管目前的方法能够较好地检测复杂背景、复杂形状的文本,但仍然存在接受域有限的问题。这些方法无法检测到极短或极长的单词,从而无法准确地检测到文档文本图像中的文本单词。我们提出了一个新的模型,我们称之为W- a网络,因为它是W形的,中间分支是阿特罗斯卷积层。我们的模型预测了一个分割图,它将图像划分为有词和无词区域,还有一个边界图,它有助于分离彼此之间更接近的词。我们使用亚特罗斯卷积和可变形卷积层来增加接收场,这有助于检测图像中的长单词。我们将文本检测问题视为一个单独的问题,而不考虑背景,使我们的模型适合于检测场景或文档图像中的文本。我们在两个场景文本数据集和一个收据数据集上展示了我们的发现。我们的结果表明,我们的方法优于当前的场景文本检测方法,这些方法在文档文本图像,特别是带有短单词的收据图像上表现不佳。
{"title":"W-A net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection","authors":"Sukhad Anand, Z. Khan","doi":"10.1109/DICTA51227.2020.9363428","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363428","url":null,"abstract":"Scene text detection has been gaining a lots of focus in research. Even though the recent methods are able to detect text in complex background having complex shapes with a fairly good accuracy, they still suffer from issues of limited receptive field. These fail from detecting extremely short or long words hence failing in detecting text words precisely in document text images. We propose a new model which we call W-A net, because of it's W shape with the middle branch being Atrous convolutional layers. Our model predicts a segmentation map which divides the image into word and no word regions and also, a boundary map which helps to segregate closer words from each other. We use Atrous convolutions and Deformable convolutional layers to increase the receptive field which helps to detect long words in an image. We treat text detection problem as a single problem irrespective of the background, making our model suitable of detecting text in scene or document images. We present our findings on two scene text datasets and a receipt dataset. Our results show that our method performs better than recent scene text detection methods which perform poorly on document text images, especially receipt images with short words.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124874217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Reconstruction and Object Detection for HoloLens HoloLens的三维重建和目标检测
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363378
Zequn Wu, Tianhao Zhao, Chuong V. Nguyen
Current smart glasses such as HoloLens excel at positioning within the physical environment, however object and task recognition are still relatively primitive. We aim to expand the available benefits of MR/AR systems by using semantic object recognition and 3D reconstruction. Particularly in this preliminary study, we successfully use a HoloLens to build 3D maps, recognise and count objects in a working environment. This is achieved by offloading these computationally expensive tasks to a remote GPU server. To further achieve realtime feedback and parallelise tasks, object detection is performed on 2D images and mapped to 3D reconstructed space. Fusion of multiple views of 2D detection is additionally performed to refine 3D object bounding boxes and separate nearby objects.
目前的智能眼镜如HoloLens擅长在物理环境中定位,但物体和任务识别仍然相对原始。我们的目标是通过使用语义对象识别和3D重建来扩大MR/AR系统的可用优势。特别是在这项初步研究中,我们成功地使用HoloLens来构建3D地图,识别和计数工作环境中的物体。这是通过将这些计算成本高昂的任务卸载到远程GPU服务器来实现的。为了进一步实现实时反馈和并行任务,在二维图像上进行目标检测,并将其映射到三维重构空间。此外,还对2D检测的多个视图进行融合,以细化3D对象边界框并分离附近的对象。
{"title":"3D Reconstruction and Object Detection for HoloLens","authors":"Zequn Wu, Tianhao Zhao, Chuong V. Nguyen","doi":"10.1109/DICTA51227.2020.9363378","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363378","url":null,"abstract":"Current smart glasses such as HoloLens excel at positioning within the physical environment, however object and task recognition are still relatively primitive. We aim to expand the available benefits of MR/AR systems by using semantic object recognition and 3D reconstruction. Particularly in this preliminary study, we successfully use a HoloLens to build 3D maps, recognise and count objects in a working environment. This is achieved by offloading these computationally expensive tasks to a remote GPU server. To further achieve realtime feedback and parallelise tasks, object detection is performed on 2D images and mapped to 3D reconstructed space. Fusion of multiple views of 2D detection is additionally performed to refine 3D object bounding boxes and separate nearby objects.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121953822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
PlaneCalib: Automatic Camera Calibration by Multiple Observations of Rigid Objects on Plane PlaneCalib:通过对平面上刚体的多次观测来自动标定相机
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363417
Vojtech Bartl, Roman Juránek, Jakub Špaňhel, A. Herout
In this work, we propose a novel method for automatic camera calibration, mainly for surveillance cameras. The calibration consists in observing objects on the ground plane of the scene; in our experiments, vehicles were used. However, any arbitrary rigid objects can be used instead, as verified by experiments with synthetic data. The calibration process uses convolutional neural network localisation of landmarks on the observed objects in the scene and the corresponding 3D positions of the localised landmarks � thus fine-grained classification of the detected vehicles in the image plane is done. The observation of the objects (detection, classification and landmark detection) enables to determine all typically used camera calibration parameters (focal length, rotation matrix, and translation vector). The experiments with real data show slightly better results in comparison with state-of-the-art work, however with an extreme speed-up. The calibration error decreased from 3.01% to 2.72% and 1223 � faster computation was achieved.
在这项工作中,我们提出了一种新的自动摄像机校准方法,主要用于监控摄像机。标定包括观测场景地平面上的物体;在我们的实验中,车辆被使用。然而,任何任意的刚性物体都可以代替,正如用合成数据进行的实验所证实的那样。校准过程使用卷积神经网络对场景中观察到的物体上的地标进行定位,以及定位后地标的相应3D位置,从而在图像平面上对检测到的车辆进行细粒度分类。对物体的观察(检测、分类和地标检测)可以确定所有常用的相机校准参数(焦距、旋转矩阵和平移向量)。与最先进的工作相比,用真实数据进行的实验显示出稍好的结果,但速度有极大的提高。校正误差从3.01%降低到2.72%,计算速度提高了1223℃。
{"title":"PlaneCalib: Automatic Camera Calibration by Multiple Observations of Rigid Objects on Plane","authors":"Vojtech Bartl, Roman Juránek, Jakub Špaňhel, A. Herout","doi":"10.1109/DICTA51227.2020.9363417","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363417","url":null,"abstract":"In this work, we propose a novel method for automatic camera calibration, mainly for surveillance cameras. The calibration consists in observing objects on the ground plane of the scene; in our experiments, vehicles were used. However, any arbitrary rigid objects can be used instead, as verified by experiments with synthetic data. The calibration process uses convolutional neural network localisation of landmarks on the observed objects in the scene and the corresponding 3D positions of the localised landmarks � thus fine-grained classification of the detected vehicles in the image plane is done. The observation of the objects (detection, classification and landmark detection) enables to determine all typically used camera calibration parameters (focal length, rotation matrix, and translation vector). The experiments with real data show slightly better results in comparison with state-of-the-art work, however with an extreme speed-up. The calibration error decreased from 3.01% to 2.72% and 1223 � faster computation was achieved.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126807944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Survey on Training Free 3D Texture-less Object Recognition Techniques 训练免费三维无纹理物体识别技术综述
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363389
Piyush Joshi, Alireza Rastegarpanah, R. Stolkin
Local surface feature based 3D object recognition is a rapidly growing research field. In time-critical applications such as robotics, training free recognition techniques are always the first choice as they are free from heavy statistical training. This paper presents an experimental analysis of 3D texture-less object recognition techniques that are free from any training. To our best knowledge, this is the first survey that includes experimental evaluation of top-rated training free recognition techniques on the datasets acquired by an RGBD camera. Based on the experimentation, we briefly present a discussion on potential future research directions.
基于局部表面特征的三维目标识别是一个快速发展的研究领域。在诸如机器人技术等时间紧迫的应用中,无需训练的识别技术总是首选,因为它们不需要繁重的统计训练。本文提出了一种无需任何训练的三维无纹理目标识别技术的实验分析。据我们所知,这是第一次在RGBD相机获得的数据集上对顶级免费训练识别技术进行实验评估的调查。在实验的基础上,对未来可能的研究方向进行了简要的讨论。
{"title":"A Survey on Training Free 3D Texture-less Object Recognition Techniques","authors":"Piyush Joshi, Alireza Rastegarpanah, R. Stolkin","doi":"10.1109/DICTA51227.2020.9363389","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363389","url":null,"abstract":"Local surface feature based 3D object recognition is a rapidly growing research field. In time-critical applications such as robotics, training free recognition techniques are always the first choice as they are free from heavy statistical training. This paper presents an experimental analysis of 3D texture-less object recognition techniques that are free from any training. To our best knowledge, this is the first survey that includes experimental evaluation of top-rated training free recognition techniques on the datasets acquired by an RGBD camera. Based on the experimentation, we briefly present a discussion on potential future research directions.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Pixel-RRT*: A Novel Skeleton Trajectory Search Algorithm for Hepatic Vessels 像素- rrt *:一种新的肝血管骨架轨迹搜索算法
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363424
Jianfeng Zhang, Wanru Chang, Fa Wu, D. Kong
In the clinical treatment of liver disease such as tumor, the acquisition of vascular skeleton trajectory is of great worth to untangle the basin and venation of hepatic vessels, because tumor and vessels are closely intertwined. In most cases, skeletonization based on the results of vascular segmentation will be prone to fracture due to the discontinuous segmenting results of vessels. As the overall tree-like system of hepatic vessels is a thin tubular tissue, we expect to start the analysis of vessels from vascular skeleton to vascular boundary, not the contrary, which can more effectively implement the image computing of hepatic vessels and interpret the tree-like expansion. To this issue, in this paper, we propose an innovative approach Pixel-RRT* inspired by Marray's Law and the growing rule of biological vasculature. It can be applied to the skeleton trajectory search for the intricate hepatic vessels. In Pixel-RRT*, we introduce the novel pixel-based cost function, the design of pixel-distributed random sampling, and a multi-goal strategy in the shared graph of random tree based on the general algorithmic framework of RRT* and RRT. Without any prior segmentation of the vessels, the proposed Pixel-RRT* can rapidly return the rationally bifurcated vascular trajectories satisfying the principle of minimal energy and topological continuity. In addition, we put forward an adaptively interpolated variational method as the postprocessing technique to make the vascular trajectory smoother by the means of energy minimization. The simulation experiments and examples of hepatic vessels demonstrate our method is efficient and utilisable. The codes will be made available at https://github.com/JeffJFZ/Pixel-RRTStar.
在肿瘤等肝脏疾病的临床治疗中,血管骨架轨迹的获取对于理清肝脏血管的盆脉关系具有重要的价值,因为肿瘤与血管是紧密交织在一起的。在大多数情况下,基于血管分割结果的骨骼化由于血管的分割结果不连续而容易发生骨折。由于肝血管的整体树形系统是一个薄管状组织,我们希望从血管骨架到血管边界开始分析血管,而不是相反,这样可以更有效地实现肝血管的图像计算和解释树形扩张。针对这一问题,在本文中,我们提出了一种创新的方法Pixel-RRT*,该方法受到Marray定律和生物血管系统生长规律的启发。它可以应用于复杂肝脏血管的骨架轨迹搜索。在Pixel-RRT*中,我们在RRT*和RRT通用算法框架的基础上,引入了新的基于像素的代价函数、像素分布随机抽样的设计以及随机树共享图的多目标策略。本文提出的Pixel-RRT*算法无需对血管进行预先分割,可以快速返回满足最小能量和拓扑连续性原则的合理分叉血管轨迹。此外,我们提出了一种自适应插值变分方法作为后处理技术,利用能量最小化的方法使血管轨迹更加平滑。肝血管的仿真实验和实例证明了该方法的有效性和实用性。这些代码将在https://github.com/JeffJFZ/Pixel-RRTStar上提供。
{"title":"Pixel-RRT*: A Novel Skeleton Trajectory Search Algorithm for Hepatic Vessels","authors":"Jianfeng Zhang, Wanru Chang, Fa Wu, D. Kong","doi":"10.1109/DICTA51227.2020.9363424","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363424","url":null,"abstract":"In the clinical treatment of liver disease such as tumor, the acquisition of vascular skeleton trajectory is of great worth to untangle the basin and venation of hepatic vessels, because tumor and vessels are closely intertwined. In most cases, skeletonization based on the results of vascular segmentation will be prone to fracture due to the discontinuous segmenting results of vessels. As the overall tree-like system of hepatic vessels is a thin tubular tissue, we expect to start the analysis of vessels from vascular skeleton to vascular boundary, not the contrary, which can more effectively implement the image computing of hepatic vessels and interpret the tree-like expansion. To this issue, in this paper, we propose an innovative approach Pixel-RRT* inspired by Marray's Law and the growing rule of biological vasculature. It can be applied to the skeleton trajectory search for the intricate hepatic vessels. In Pixel-RRT*, we introduce the novel pixel-based cost function, the design of pixel-distributed random sampling, and a multi-goal strategy in the shared graph of random tree based on the general algorithmic framework of RRT* and RRT. Without any prior segmentation of the vessels, the proposed Pixel-RRT* can rapidly return the rationally bifurcated vascular trajectories satisfying the principle of minimal energy and topological continuity. In addition, we put forward an adaptively interpolated variational method as the postprocessing technique to make the vascular trajectory smoother by the means of energy minimization. The simulation experiments and examples of hepatic vessels demonstrate our method is efficient and utilisable. The codes will be made available at https://github.com/JeffJFZ/Pixel-RRTStar.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113968632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Affordance Segmentation: An Investigative Study 学习能力分割:一项调查研究
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363390
Chau D. M. Nguyen, S. Z. Gilani, S. Islam, D. Suter
Affordance segmentation aims at recognising, localising and segmenting affordances from images, enabling scene understanding of visual content in many applications in robotic perception. Supervised learning with deep networks has become very popular in affordance segmentation. However, very few studies have investigated the factors that contribute to improved learning of affordances. This investigation is essential to improve precision and balance cost-efficiency when learning affordance segmentation. In this paper, we address this task and identify two prime factors affecting precision of learning affordance segmentation: (1) The quality of features extracted from the classification module and (2) the dearth of information in the Region Proposal Network (RPN). Consequently, we replace the backbone classification model and introduce a novel multiple alignment strategy in the RPN. Our results obtained through extensive experimentation validate our contributions and outperform the state-of-the-art affordance segmentation models.
可视性分割旨在识别、定位和分割图像中的可视性,从而在机器人感知的许多应用中实现对视觉内容的场景理解。深度网络的监督学习在功能分割中非常流行。然而,很少有研究调查了有助于提高可视性学习的因素。这一研究对于提高学习可视性分割的准确性和平衡成本效率至关重要。在本文中,我们解决了这一问题,并确定了影响学习能力分割精度的两个主要因素:(1)从分类模块中提取的特征质量;(2)区域建议网络(RPN)中信息的缺乏。因此,我们取代了骨干分类模型,并在RPN中引入了一种新的多对齐策略。我们通过广泛的实验获得的结果验证了我们的贡献,并且优于最先进的功能分割模型。
{"title":"Learning Affordance Segmentation: An Investigative Study","authors":"Chau D. M. Nguyen, S. Z. Gilani, S. Islam, D. Suter","doi":"10.1109/DICTA51227.2020.9363390","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363390","url":null,"abstract":"Affordance segmentation aims at recognising, localising and segmenting affordances from images, enabling scene understanding of visual content in many applications in robotic perception. Supervised learning with deep networks has become very popular in affordance segmentation. However, very few studies have investigated the factors that contribute to improved learning of affordances. This investigation is essential to improve precision and balance cost-efficiency when learning affordance segmentation. In this paper, we address this task and identify two prime factors affecting precision of learning affordance segmentation: (1) The quality of features extracted from the classification module and (2) the dearth of information in the Region Proposal Network (RPN). Consequently, we replace the backbone classification model and introduce a novel multiple alignment strategy in the RPN. Our results obtained through extensive experimentation validate our contributions and outperform the state-of-the-art affordance segmentation models.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2020 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1