首页 > 最新文献

2019 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
An Automated Method for Individual Wire Extraction from Power Line Corridor using LiDAR Data 一种利用激光雷达数据自动提取电力线走廊单线的方法
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946085
Nosheen Munir, M. Awrangjeb, Bela Stantic
The speedy development in electricity infrastructure due to urge in domestic and business usage requires a safe and secure maintenance of power line corridor (PLC) for the delivery of uninterrupted power supply to consumers. Inspection of PLC using light detection and ranging (LiDAR) has gain importance in the recent decade. In this paper, we have presented a new method for automatic extraction of wires from PLC using LiDAR point cloud data. Firstly, the pylons and vegetations around the PLC is removed by using their shape and area properties. Then, pylon locations are used to extract the wire points in the form of span between two consecutive pylons. These spans are divided into several segments such that each segment has slice of wires from span. A clustering algorithm based on the Euclidean distance is further used to separate wires which are at different height levels in each segment and the wire-count on each cluster is generated by defining the distance thresholds. Each clustered points are projected into a plane perpendicular to the direction of wires. Finally, wires are separated by fitting a 2D line and by using point and line distance formula. Results are presented on different spans from complex and semi-complex datasets with low point density. Experimental results show that the proposed method is able to extract the single or bundle wires with completeness and correctness of 99% and 100% respectively.
由于家庭和商业用电的需求,电力基础设施的快速发展需要对电力线走廊(PLC)进行安全可靠的维护,以便向用户提供不间断的电力供应。近十年来,利用激光雷达(LiDAR)对PLC进行检测变得越来越重要。本文提出了一种利用激光雷达点云数据从PLC中自动提取线材的新方法。首先,利用PLC周围的塔架和植被的形状和面积特性去除它们。然后,利用塔的位置,以两个连续的塔之间的跨度的形式提取线点。这些跨度被分成几个部分,这样每个部分都有从跨度中取出的电线片。采用基于欧几里得距离的聚类算法,对每个线段中不同高度的线段进行分离,并通过定义距离阈值生成每个聚类上的线段数。每个聚类点被投射到垂直于导线方向的平面上。最后,通过拟合二维线和使用点线距离公式来分离导线。在低点密度的复杂和半复杂数据集的不同跨度上给出了结果。实验结果表明,该方法提取单根或束状导线的完整性和正确性分别为99%和100%。
{"title":"An Automated Method for Individual Wire Extraction from Power Line Corridor using LiDAR Data","authors":"Nosheen Munir, M. Awrangjeb, Bela Stantic","doi":"10.1109/DICTA47822.2019.8946085","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946085","url":null,"abstract":"The speedy development in electricity infrastructure due to urge in domestic and business usage requires a safe and secure maintenance of power line corridor (PLC) for the delivery of uninterrupted power supply to consumers. Inspection of PLC using light detection and ranging (LiDAR) has gain importance in the recent decade. In this paper, we have presented a new method for automatic extraction of wires from PLC using LiDAR point cloud data. Firstly, the pylons and vegetations around the PLC is removed by using their shape and area properties. Then, pylon locations are used to extract the wire points in the form of span between two consecutive pylons. These spans are divided into several segments such that each segment has slice of wires from span. A clustering algorithm based on the Euclidean distance is further used to separate wires which are at different height levels in each segment and the wire-count on each cluster is generated by defining the distance thresholds. Each clustered points are projected into a plane perpendicular to the direction of wires. Finally, wires are separated by fitting a 2D line and by using point and line distance formula. Results are presented on different spans from complex and semi-complex datasets with low point density. Experimental results show that the proposed method is able to extract the single or bundle wires with completeness and correctness of 99% and 100% respectively.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"103 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87907610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Logical Layout Analysis using Deep Learning 使用深度学习的逻辑布局分析
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946046
Annus Zulfiqar, A. Ul-Hasan, F. Shafait
Logical layout analysis plays an important part in document understanding. It can become a challenging task due to varying formats and layouts. Researchers have proposed different ways to solve this problem, mostly using visual information in some way and a complex pipeline. In this paper, we present a simple technique for labelling the logical structures in document images. We use visual and textual features from the document images to label zones. We utilize Recurrent Neural Networks, specifically 2 layers of LSTM, which input the text from the zone that we want to classify as sequences of words and the normalized position of each word with respect to the page width and height. Comparisons are made by comparing the image under test with the known layouts and labels are assigned to zones accordingly. The labels are abstract, title, author names, and affiliation; however, the text also contains very important information for the task at hand. The presented approach achieved an overall accuracy of 96.21% on publicly available MARG dataset.
逻辑布局分析在文档理解中起着重要作用。由于格式和布局的不同,这可能成为一项具有挑战性的任务。研究人员提出了不同的方法来解决这个问题,主要是通过某种方式使用视觉信息和复杂的管道。在本文中,我们提出了一种简单的技术来标记文档图像中的逻辑结构。我们使用文档图像的视觉和文本特征来标记区域。我们使用递归神经网络,特别是2层LSTM,它从我们想要分类的区域输入文本作为单词序列,以及每个单词相对于页面宽度和高度的规范化位置。通过将待测图像与已知布局进行比较,并相应地将标签分配给区域。标签包括摘要、标题、作者姓名和隶属关系;然而,文本也包含了手头任务的非常重要的信息。该方法在公共MARG数据集上的总体准确率达到96.21%。
{"title":"Logical Layout Analysis using Deep Learning","authors":"Annus Zulfiqar, A. Ul-Hasan, F. Shafait","doi":"10.1109/DICTA47822.2019.8946046","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946046","url":null,"abstract":"Logical layout analysis plays an important part in document understanding. It can become a challenging task due to varying formats and layouts. Researchers have proposed different ways to solve this problem, mostly using visual information in some way and a complex pipeline. In this paper, we present a simple technique for labelling the logical structures in document images. We use visual and textual features from the document images to label zones. We utilize Recurrent Neural Networks, specifically 2 layers of LSTM, which input the text from the zone that we want to classify as sequences of words and the normalized position of each word with respect to the page width and height. Comparisons are made by comparing the image under test with the known layouts and labels are assigned to zones accordingly. The labels are abstract, title, author names, and affiliation; however, the text also contains very important information for the task at hand. The presented approach achieved an overall accuracy of 96.21% on publicly available MARG dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75491786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast Point Cloud Registration using Semantic Segmentation 使用语义分割快速点云配准
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945870
Giang Truong, S. Z. Gilani, S. Islam, D. Suter
Deep learning has recently delivered relatively high quality semantic segmentation of visual and point-cloud data. This paper is primarily concerned with the use of such semantic segmentation for point cloud registration. In particular, we are motivated by the need to speed up, for large scale data sets, algorithms for registration that guarantee optimality (in terms of maximising consensus). That semantic information can help prune bad hypotheses for point matches is rather obvious, and we demonstrate one such relatively simple approach by modifying a recent optimal registration algorithm [6] to take advantage of semantic information. However, we also make another contribution in proposing a novel variation of deep learning approaches to point cloud registration. Again, our motivation is handling large data sets and in this case we are able to provide an algorithm that achieves on par with state-of-the-art performance on the semantic segmentation task. In short, we have shown how to speed up both the generation of the semantic information, and how to use that semantic information to speed up point cloud registration, in the context of large scale point cloud data-sets.
深度学习最近为视觉和点云数据提供了相对高质量的语义分割。本文主要研究这种语义分割在点云配准中的应用。特别是,我们的动机是需要加快大规模数据集的注册算法,以保证最优性(就最大化共识而言)。语义信息可以帮助去除点匹配的错误假设,这是相当明显的,我们通过修改最近的最优配准算法[6]来利用语义信息,展示了一种相对简单的方法。然而,我们也做出了另一项贡献,提出了一种新的深度学习方法来进行点云配准。同样,我们的动机是处理大型数据集,在这种情况下,我们能够提供一种在语义分割任务上达到最先进性能的算法。简而言之,我们已经展示了如何在大规模点云数据集的背景下加速语义信息的生成,以及如何使用该语义信息来加速点云配准。
{"title":"Fast Point Cloud Registration using Semantic Segmentation","authors":"Giang Truong, S. Z. Gilani, S. Islam, D. Suter","doi":"10.1109/DICTA47822.2019.8945870","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945870","url":null,"abstract":"Deep learning has recently delivered relatively high quality semantic segmentation of visual and point-cloud data. This paper is primarily concerned with the use of such semantic segmentation for point cloud registration. In particular, we are motivated by the need to speed up, for large scale data sets, algorithms for registration that guarantee optimality (in terms of maximising consensus). That semantic information can help prune bad hypotheses for point matches is rather obvious, and we demonstrate one such relatively simple approach by modifying a recent optimal registration algorithm [6] to take advantage of semantic information. However, we also make another contribution in proposing a novel variation of deep learning approaches to point cloud registration. Again, our motivation is handling large data sets and in this case we are able to provide an algorithm that achieves on par with state-of-the-art performance on the semantic segmentation task. In short, we have shown how to speed up both the generation of the semantic information, and how to use that semantic information to speed up point cloud registration, in the context of large scale point cloud data-sets.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"59 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80850311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Multimodal Brain Tumour Segmentation using Densely Connected 3D Convolutional Neural Network 基于密集连接三维卷积神经网络的多模态脑肿瘤分割
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946023
M. Ghaffari, A. Sowmya, R. Oliver, Len Hamey
Reliable brain tumour segmentation methods from brain scans are essential for accurate diagnosis and treatment planning. In this paper, we propose a semantic segmentation method based on convolutional neural networks for brain tumour segmentation using multimodal brain scans. The proposed model is a modified version of the well-known U-net architecture. It gains from DenseNet blocks between the encoder and decoder parts of the U-net to transfer more semantic information from the input to the output. In addition, to speed up the training process, we employed deep supervision by adding segmentation blocks at the end of the decoder layers and summing up their outputs to generate the final output of the network. We trained and evaluated our model using the BraTS 2018 dataset. Comparing the results from the proposed model and a generic U-net, our model achieved higher segmentation accuracy in terms of the Dice score.
可靠的脑肿瘤分割方法从脑部扫描是必不可少的准确诊断和治疗计划。在本文中,我们提出了一种基于卷积神经网络的语义分割方法,用于基于多模态脑部扫描的脑肿瘤分割。提出的模型是众所周知的U-net体系结构的改进版本。它从U-net的编码器和解码器部分之间的DenseNet块中获益,从而将更多的语义信息从输入传输到输出。此外,为了加快训练过程,我们采用深度监督,在解码器层的末端添加分割块,并将它们的输出相加,以生成网络的最终输出。我们使用BraTS 2018数据集训练和评估了我们的模型。将所提出的模型与通用U-net的结果进行比较,我们的模型在Dice得分方面取得了更高的分割精度。
{"title":"Multimodal Brain Tumour Segmentation using Densely Connected 3D Convolutional Neural Network","authors":"M. Ghaffari, A. Sowmya, R. Oliver, Len Hamey","doi":"10.1109/DICTA47822.2019.8946023","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946023","url":null,"abstract":"Reliable brain tumour segmentation methods from brain scans are essential for accurate diagnosis and treatment planning. In this paper, we propose a semantic segmentation method based on convolutional neural networks for brain tumour segmentation using multimodal brain scans. The proposed model is a modified version of the well-known U-net architecture. It gains from DenseNet blocks between the encoder and decoder parts of the U-net to transfer more semantic information from the input to the output. In addition, to speed up the training process, we employed deep supervision by adding segmentation blocks at the end of the decoder layers and summing up their outputs to generate the final output of the network. We trained and evaluated our model using the BraTS 2018 dataset. Comparing the results from the proposed model and a generic U-net, our model achieved higher segmentation accuracy in terms of the Dice score.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90159836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Deep Corrosion Assessment for Electrical Transmission Towers 输电塔的深度腐蚀评价
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945905
Teng Zhang, Liangchen Liu, A. Wiliem, Stephen Connor, Zelkjo Ilich, Eddie Van Der Draai, B. Lovell
Galvanised steel transmission towers in electrical power grids suffer from corrosion to different levels depending on age and environment. To ensure the power grid can operate safely, significant resources are spent to monitor the corrosion level of towers. Photographs from helicopters, drones, and a variety of staffs are often used to capture condition, however, these images still need manual inspection to determine the corrosion level before carrying out maintenance works. In this paper, we describe a framework employing multiple deep neural networks based classifiers and detectors to perform automatic image-based condition monitoring for steel transmission towers. Given a random variety of images of a tower, our proposed framework will first determine the location of the image on the structure via a trained zone classifier. Then, fine-grain corrosion inspection will be performed on both fasteners and structural members, respectively. In addition, an automatic zoomin functionality will be applied to images which have high resolution but are a long distance away. This step will ensure the detection performance on small objects on the tower. Finally, the overall corrosion status report for this tower will be calculated and generated automatically. Additionally, we released a subset of our data to contribute to this novel direction. Experiments show that our framework can assess the tower efficiently.
电网镀锌钢输电塔因使用年限和环境的不同,受到不同程度的腐蚀。为了确保电网的安全运行,大量的资源被用于监测塔的腐蚀水平。通常使用直升机、无人机和各种工作人员拍摄的照片来捕捉情况,然而,在进行维护工作之前,这些图像仍然需要人工检查以确定腐蚀程度。在本文中,我们描述了一个框架,利用多个基于深度神经网络的分类器和检测器对钢输电塔进行基于图像的自动状态监测。给定塔楼的随机图像,我们提出的框架将首先通过训练好的区域分类器确定图像在结构上的位置。然后分别对紧固件和结构件进行细粒腐蚀检测。此外,自动变焦功能将应用于高分辨率但距离较远的图像。此步骤将确保对塔上小物体的检测性能。最后,自动计算并生成该塔的整体腐蚀状态报告。此外,我们发布了我们的数据子集,以促进这个新颖的方向。实验表明,该框架能有效地对塔进行评估。
{"title":"Deep Corrosion Assessment for Electrical Transmission Towers","authors":"Teng Zhang, Liangchen Liu, A. Wiliem, Stephen Connor, Zelkjo Ilich, Eddie Van Der Draai, B. Lovell","doi":"10.1109/DICTA47822.2019.8945905","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945905","url":null,"abstract":"Galvanised steel transmission towers in electrical power grids suffer from corrosion to different levels depending on age and environment. To ensure the power grid can operate safely, significant resources are spent to monitor the corrosion level of towers. Photographs from helicopters, drones, and a variety of staffs are often used to capture condition, however, these images still need manual inspection to determine the corrosion level before carrying out maintenance works. In this paper, we describe a framework employing multiple deep neural networks based classifiers and detectors to perform automatic image-based condition monitoring for steel transmission towers. Given a random variety of images of a tower, our proposed framework will first determine the location of the image on the structure via a trained zone classifier. Then, fine-grain corrosion inspection will be performed on both fasteners and structural members, respectively. In addition, an automatic zoomin functionality will be applied to images which have high resolution but are a long distance away. This step will ensure the detection performance on small objects on the tower. Finally, the overall corrosion status report for this tower will be calculated and generated automatically. Additionally, we released a subset of our data to contribute to this novel direction. Experiments show that our framework can assess the tower efficiently.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"71 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88857474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Block Pruning Based on Kernel and Feature Stablization 基于核和特征稳定的高效块剪枝
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946001
Sheng Xu, Hanlin Chen, Kexin Liu, Jinhu Lii, Baochang Zhang
With the development of computer vision research, the architecture of convolutional neural network becomes more and more complex to reach the state-of-the-art performance. Is the complexity of the model necessarily proportional to its accuracy? To answer this, the compression of the network has attracted much attention in the academy and industry. Existing network pruning methods mostly rely on the scoring mechanism of complexity or diversity of kernels to compress the network, and then build the network model after removing the kernels by tuning or training on the input data. These methods are cumbersome and depend on a well-trained pre-trained model. In this paper, we propose an end-to-end block pruning method based on kernel and feature stability by pruning blocks efficiently. To accomplish this, we firstly introduce a mask to scale the output of the blocks, and the L1 regularization term to monitor the mask update. Second, we introduce the Center Loss to guarantee that the feature does not deviate greatly during learning. To converge fast, we introduce fast iterative shrinkage-thresholding algorithm (FISTA) to optimize the mask, by which a more fast and reliable pruning process is achieved. We implement experiments on different datasets, including CIFAR-10 and ImageNet ILSVRC2012. All the experiments have achieved the state-of-the-art accuracy.
随着计算机视觉研究的发展,卷积神经网络的结构越来越复杂,以达到最先进的性能。模型的复杂性是否与它的准确性成正比?为了回答这个问题,网络的压缩已经引起了学术界和工业界的广泛关注。现有的网络剪枝方法大多依靠核的复杂性或多样性的评分机制对网络进行压缩,然后在输入数据上进行调优或训练,去除核后构建网络模型。这些方法很麻烦,并且依赖于训练良好的预训练模型。本文提出了一种基于核和特征稳定性的端到端数据块剪枝方法。为了实现这一点,我们首先引入一个掩码来缩放块的输出,并引入L1正则化项来监控掩码更新。其次,我们引入了中心损失来保证特征在学习过程中不会有很大的偏离。为了快速收敛,我们引入快速迭代收缩阈值算法(FISTA)对掩码进行优化,实现了更快速可靠的剪枝过程。我们在不同的数据集上进行了实验,包括CIFAR-10和ImageNet ILSVRC2012。所有的实验都达到了最先进的精度。
{"title":"Efficient Block Pruning Based on Kernel and Feature Stablization","authors":"Sheng Xu, Hanlin Chen, Kexin Liu, Jinhu Lii, Baochang Zhang","doi":"10.1109/DICTA47822.2019.8946001","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946001","url":null,"abstract":"With the development of computer vision research, the architecture of convolutional neural network becomes more and more complex to reach the state-of-the-art performance. Is the complexity of the model necessarily proportional to its accuracy? To answer this, the compression of the network has attracted much attention in the academy and industry. Existing network pruning methods mostly rely on the scoring mechanism of complexity or diversity of kernels to compress the network, and then build the network model after removing the kernels by tuning or training on the input data. These methods are cumbersome and depend on a well-trained pre-trained model. In this paper, we propose an end-to-end block pruning method based on kernel and feature stability by pruning blocks efficiently. To accomplish this, we firstly introduce a mask to scale the output of the blocks, and the L1 regularization term to monitor the mask update. Second, we introduce the Center Loss to guarantee that the feature does not deviate greatly during learning. To converge fast, we introduce fast iterative shrinkage-thresholding algorithm (FISTA) to optimize the mask, by which a more fast and reliable pruning process is achieved. We implement experiments on different datasets, including CIFAR-10 and ImageNet ILSVRC2012. All the experiments have achieved the state-of-the-art accuracy.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80003359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
OGaze: Gaze Prediction in Egocentric Videos for Attentional Object Selection 凝视:自我中心视频中用于注意对象选择的凝视预测
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945893
Mohammad Al-Naser, Shoaib Ahmed Siddiqui, Hiroki Ohashi, Sheraz Ahmed, Nakamura Katsuyki, Takuto Sato, A. Dengel
This paper proposes a novel gaze-estimation model for attentional object selection tasks. The key features of our model are two-fold: (i) usage of the deformable convolutional layers to better incorporate spatial dependencies of different shapes of objects and background, (ii) formulation of the gaze-estimation problem in two different ways, i.e. as a classification as well as a regression problem. We combine the two different formulations using a joint loss that incorporates both the cross-entropy as well as the mean-squared error in order to train our model. The experimental results on two publicly available datasets indicates that our model not only achieved real-time performance (13–18 FPS), but also outperformed the state-of-the-art models on the OSdataset along with comparable performance on GTEA-plus dataset.
提出了一种新的注意力对象选择任务的注视估计模型。我们的模型的关键特征是双重的:(i)使用可变形卷积层来更好地结合不同形状的物体和背景的空间依赖性,(ii)以两种不同的方式提出凝视估计问题,即作为分类问题和回归问题。为了训练我们的模型,我们使用结合交叉熵和均方误差的联合损失来组合两种不同的公式。在两个公开可用的数据集上的实验结果表明,我们的模型不仅实现了实时性能(13-18 FPS),而且在OSdataset上的性能优于最先进的模型,并且在GTEA-plus数据集上的性能相当。
{"title":"OGaze: Gaze Prediction in Egocentric Videos for Attentional Object Selection","authors":"Mohammad Al-Naser, Shoaib Ahmed Siddiqui, Hiroki Ohashi, Sheraz Ahmed, Nakamura Katsuyki, Takuto Sato, A. Dengel","doi":"10.1109/DICTA47822.2019.8945893","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945893","url":null,"abstract":"This paper proposes a novel gaze-estimation model for attentional object selection tasks. The key features of our model are two-fold: (i) usage of the deformable convolutional layers to better incorporate spatial dependencies of different shapes of objects and background, (ii) formulation of the gaze-estimation problem in two different ways, i.e. as a classification as well as a regression problem. We combine the two different formulations using a joint loss that incorporates both the cross-entropy as well as the mean-squared error in order to train our model. The experimental results on two publicly available datasets indicates that our model not only achieved real-time performance (13–18 FPS), but also outperformed the state-of-the-art models on the OSdataset along with comparable performance on GTEA-plus dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87873428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Indian Sign Language Gesture Recognition using Image Processing and Deep Learning 使用图像处理和深度学习的印度手语手势识别
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945850
N. Bhagat, Y. Vishnusai, G. Rathna
Speech impaired people use hand based gestures to communicate. Unfortunately, the vast majority of the people are not aware of the semantics of these gestures. In a attempt to bridge the same, we propose a real time hand gesture recognition system based on the data captured by the Microsoft Kinect RGB-D camera. Given that there is no one to one mapping between the pixels of the depth and the RGB camera, we used computer vision techniques like 3D contruction and affine transformation. After achieving one to one mapping, segmentation of the hand gestures was done from the background noise. Convolutional Neural Networks (CNNs) were utilised for training 36 static gestures relating to Indian Sign Language (ISL) alphabets and numbers. The model achieved an accuracy of 98.81% on training using 45,000 RGB images and 45,000 depth images. Further Convolutional LSTMs were used for training 10 ISL dynamic word gestures and an accuracy of 99.08% was obtained by training 1080 videos. The model showed accurate real time performance on prediction of ISL static gestures, leaving a scope for further research on sentence formation through gestures. The model also showed competitive adaptability to American Sign Language (ASL) gestures when the ISL models weights were transfer learned to ASL and it resulted in giving 97.71% accuracy.
语言障碍人士使用手势进行交流。不幸的是,绝大多数人都没有意识到这些手势的语义。为了解决这一问题,我们提出了一种基于微软Kinect RGB-D摄像头捕获数据的实时手势识别系统。考虑到深度像素和RGB相机之间没有一对一的映射,我们使用了3D构造和仿射变换等计算机视觉技术。实现一对一映射后,从背景噪声中对手势进行分割。卷积神经网络(cnn)被用于训练36种与印度手语(ISL)字母和数字相关的静态手势。该模型使用45000张RGB图像和45000张深度图像进行训练,准确率达到98.81%。进一步使用卷积lstm对10个ISL动态单词手势进行训练,训练1080个视频,准确率达到99.08%。该模型对ISL静态手势的预测具有准确的实时性,为手势造句的进一步研究留下了空间。当将ISL模型的权重转移到ASL时,该模型对美国手语(ASL)手势也表现出竞争适应性,准确率达到97.71%。
{"title":"Indian Sign Language Gesture Recognition using Image Processing and Deep Learning","authors":"N. Bhagat, Y. Vishnusai, G. Rathna","doi":"10.1109/DICTA47822.2019.8945850","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945850","url":null,"abstract":"Speech impaired people use hand based gestures to communicate. Unfortunately, the vast majority of the people are not aware of the semantics of these gestures. In a attempt to bridge the same, we propose a real time hand gesture recognition system based on the data captured by the Microsoft Kinect RGB-D camera. Given that there is no one to one mapping between the pixels of the depth and the RGB camera, we used computer vision techniques like 3D contruction and affine transformation. After achieving one to one mapping, segmentation of the hand gestures was done from the background noise. Convolutional Neural Networks (CNNs) were utilised for training 36 static gestures relating to Indian Sign Language (ISL) alphabets and numbers. The model achieved an accuracy of 98.81% on training using 45,000 RGB images and 45,000 depth images. Further Convolutional LSTMs were used for training 10 ISL dynamic word gestures and an accuracy of 99.08% was obtained by training 1080 videos. The model showed accurate real time performance on prediction of ISL static gestures, leaving a scope for further research on sentence formation through gestures. The model also showed competitive adaptability to American Sign Language (ASL) gestures when the ISL models weights were transfer learned to ASL and it resulted in giving 97.71% accuracy.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86439365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Radiography Contrast Enhancement: Smoothed LHE Filter a Practical Solution for Digital X-Rays with Mach Band 射线照相对比度增强:平滑的LHE滤波器是马赫波段数字x射线的实用解决方案
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946114
P. Ambalathankandy, Yafei Ou, Jyotsna Kochiyil, Shinya Takamaeda-Yamazaki, M. Motomura, T. Asai, M. Ikebe
In this paper, we analyze and propose the usefulness of smoothed LHE (Local Histogram Equalization) filters for processing images with low contrast like digital radiographic images. Digital X-rays are known to have optical illusions like Mach bands and background contrast effects, which are caused by lateral inhibition phenomena. We observe that using multilayer (ML) methods with latest edge preserving filter for contrast enhancement in medical images can be problematic and could lead to faulty diagnosis from detail exaggeration which are caused by uncontrolled texture boosting from user defined gain settings. ML filters are designed with few subjectively selected filter kernel sizes, which can result in unnaturalness in output images. We propose a smoothed LHE-like filter with an adaptive gain control, that is more robust and can enhance fine details in digital X-rays while maintaining their intrinsic naturalness. Preserving naturalness in X-ray images are an essential feature for radiographic diagnostics. Our proposed filter has 0(1) complexity and can easily be controlled and operated with a continuously varying kernel size, which functions like an active high pass filter, amplifying all frequencies within the kernel.
在本文中,我们分析并提出了平滑LHE(局部直方图均衡化)滤波器在处理低对比度图像(如数字射线图像)中的实用性。众所周知,数字x射线具有马赫波段和背景对比效应等视错觉,这是由横向抑制现象引起的。我们观察到,在医学图像中使用带有最新边缘保持滤波器的多层(ML)方法进行对比度增强可能存在问题,并且可能导致由于用户自定义增益设置不受控制的纹理增强而导致的细节夸大而导致的错误诊断。机器学习滤波器设计时,主观选择的滤波器核大小很少,这可能导致输出图像的不自然。我们提出了一种光滑的lhe类滤波器,具有自适应增益控制,它更鲁棒,可以增强数字x射线中的精细细节,同时保持其固有的自然性。保持x射线图像的自然性是放射学诊断的基本特征。我们提出的滤波器具有0(1)复杂度,并且可以很容易地控制和操作连续变化的核大小,其功能类似于有源高通滤波器,放大核内的所有频率。
{"title":"Radiography Contrast Enhancement: Smoothed LHE Filter a Practical Solution for Digital X-Rays with Mach Band","authors":"P. Ambalathankandy, Yafei Ou, Jyotsna Kochiyil, Shinya Takamaeda-Yamazaki, M. Motomura, T. Asai, M. Ikebe","doi":"10.1109/DICTA47822.2019.8946114","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946114","url":null,"abstract":"In this paper, we analyze and propose the usefulness of smoothed LHE (Local Histogram Equalization) filters for processing images with low contrast like digital radiographic images. Digital X-rays are known to have optical illusions like Mach bands and background contrast effects, which are caused by lateral inhibition phenomena. We observe that using multilayer (ML) methods with latest edge preserving filter for contrast enhancement in medical images can be problematic and could lead to faulty diagnosis from detail exaggeration which are caused by uncontrolled texture boosting from user defined gain settings. ML filters are designed with few subjectively selected filter kernel sizes, which can result in unnaturalness in output images. We propose a smoothed LHE-like filter with an adaptive gain control, that is more robust and can enhance fine details in digital X-rays while maintaining their intrinsic naturalness. Preserving naturalness in X-ray images are an essential feature for radiographic diagnostics. Our proposed filter has 0(1) complexity and can easily be controlled and operated with a continuously varying kernel size, which functions like an active high pass filter, amplifying all frequencies within the kernel.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86201287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Feature Engineering Meets Deep Learning: A Case Study on Table Detection in Documents 特征工程与深度学习:文档中表检测的案例研究
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945929
M. Shahzad, Rabeya Noor, Sheraz Ahmad, A. Mian, F. Shafait
Traditional computer vision approaches heavily relied on hand-crafted features for tasks such as visual object detection and recognition. The recent success of deep learning in automatically extracting representative and powerful features from images has brought a paradigm shift in this area. As a side effect, decades of research into hand-crafted features is considered outdated. In this paper, we present an approach for table detection in which we leverage a deep learning based table detection model with hand-crafted features from a classical table detection method. We demonstrate that by using a suitable encoding of hand-crafted features, the deep learning model is able to perform better at the detection task. Experiments on publicly available UNLV dataset show that the presented method achieves an accuracy comparable with the state-of-the-art deep learning methods without the need of extensive hyper-parameter tuning.
传统的计算机视觉方法严重依赖于手工制作的特征来完成视觉对象检测和识别等任务。最近深度学习在自动从图像中提取代表性和强大特征方面的成功,带来了这一领域的范式转变。副作用是,几十年来对手工制作特征的研究被认为是过时的。在本文中,我们提出了一种表检测方法,其中我们利用基于深度学习的表检测模型,该模型具有来自经典表检测方法的手工制作特征。我们证明,通过使用合适的手工特征编码,深度学习模型能够在检测任务中表现更好。在公开可用的UNLV数据集上的实验表明,该方法在不需要大量超参数调优的情况下,达到了与最先进的深度学习方法相当的精度。
{"title":"Feature Engineering Meets Deep Learning: A Case Study on Table Detection in Documents","authors":"M. Shahzad, Rabeya Noor, Sheraz Ahmad, A. Mian, F. Shafait","doi":"10.1109/DICTA47822.2019.8945929","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945929","url":null,"abstract":"Traditional computer vision approaches heavily relied on hand-crafted features for tasks such as visual object detection and recognition. The recent success of deep learning in automatically extracting representative and powerful features from images has brought a paradigm shift in this area. As a side effect, decades of research into hand-crafted features is considered outdated. In this paper, we present an approach for table detection in which we leverage a deep learning based table detection model with hand-crafted features from a classical table detection method. We demonstrate that by using a suitable encoding of hand-crafted features, the deep learning model is able to perform better at the detection task. Experiments on publicly available UNLV dataset show that the presented method achieves an accuracy comparable with the state-of-the-art deep learning methods without the need of extensive hyper-parameter tuning.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81733879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1