首页 > 最新文献

2019 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Automated Building Footprint and 3D Building Model Generation from Lidar Point Cloud Data 基于激光雷达点云数据的自动建筑足迹和3D建筑模型生成
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946008
F. T. Kurdi, M. Awrangjeb, Alan Wee-Chung Liew
Although much effort has been spent in developing a stable algorithm for 3D building modelling from Lidar data, this topic still attracts a lot of attention in the literature. A key task of this problem is the automatic building roof segmentation. Due to the great diversity of building typology, and the noisiness and heterogeneity of point cloud data, the building roof segmentation result needs to be verified/rectified with some geometric constrains before it is used to generate the 3D building models. Otherwise, the generated building model may suffer from undesirable deformations. This paper suggests the generation of 3D building model from Lidar data in two steps. The first step is the automatic 2D building modelling and the second step is the automatic conversion of a 2D building model into 3D model. This approach allows the 2D building model to be refined before starting the 3D building model generation. Furthermore, this approach allows getting the 2D and 3D building models simultaneously. The first step of the proposed algorithm is the generation of the 2D building model. Then after enhancing and fitting the roof planes, the roof plane boundaries are converted into 3D by analysing the relationships between neighbouring planes. This is followed by the adjustment of the 3D roof vertices. Experiment indicated that the proposed algorithm is accurate and robust in generating 3D building models from Lidar data.
尽管人们已经花费了大量的精力来开发一种稳定的基于激光雷达数据的三维建筑建模算法,但这一主题在文献中仍然引起了很多关注。该问题的一个关键任务是建筑物屋顶的自动分割。由于建筑类型的多样性,以及点云数据的噪声和异构性,在生成三维建筑模型之前,需要对建筑屋顶分割结果进行一些几何约束的验证/校正。否则,生成的建筑模型可能会出现不希望出现的变形。本文提出了利用激光雷达数据生成三维建筑模型的两步方法。第一步是自动进行二维建筑建模,第二步是将二维建筑模型自动转换为三维模型。这种方法允许在开始生成3D建筑模型之前对2D建筑模型进行细化。此外,这种方法允许同时获得2D和3D建筑模型。该算法的第一步是生成二维建筑模型。然后对屋顶平面进行增强和拟合,通过分析相邻平面之间的关系,将屋顶平面边界转化为三维空间。接下来是3D屋顶顶点的调整。实验结果表明,该算法在利用激光雷达数据生成三维建筑模型方面具有较好的准确性和鲁棒性。
{"title":"Automated Building Footprint and 3D Building Model Generation from Lidar Point Cloud Data","authors":"F. T. Kurdi, M. Awrangjeb, Alan Wee-Chung Liew","doi":"10.1109/DICTA47822.2019.8946008","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946008","url":null,"abstract":"Although much effort has been spent in developing a stable algorithm for 3D building modelling from Lidar data, this topic still attracts a lot of attention in the literature. A key task of this problem is the automatic building roof segmentation. Due to the great diversity of building typology, and the noisiness and heterogeneity of point cloud data, the building roof segmentation result needs to be verified/rectified with some geometric constrains before it is used to generate the 3D building models. Otherwise, the generated building model may suffer from undesirable deformations. This paper suggests the generation of 3D building model from Lidar data in two steps. The first step is the automatic 2D building modelling and the second step is the automatic conversion of a 2D building model into 3D model. This approach allows the 2D building model to be refined before starting the 3D building model generation. Furthermore, this approach allows getting the 2D and 3D building models simultaneously. The first step of the proposed algorithm is the generation of the 2D building model. Then after enhancing and fitting the roof planes, the roof plane boundaries are converted into 3D by analysing the relationships between neighbouring planes. This is followed by the adjustment of the 3D roof vertices. Experiment indicated that the proposed algorithm is accurate and robust in generating 3D building models from Lidar data.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"102 8 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91324146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Tree Log Identity Matching using Convolutional Correlation Networks 使用卷积相关网络的树日志恒等式匹配
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945865
Mikko Vihlman, Jakke Kulovesi, A. Visala
Log identification is an important task in silviculture and forestry. It involves matching tree logs with each other and telling which of the known individuals a given specimen is. Forest harvesters can image the logs and assess their quality while cutting trees in the forest. Identification allows each log to be traced back to the location it was grown in and efficiently choosing logs of specific quality in the sawmill. In this paper, a deep two-stream convolutional neural network is used to measure the likelihood that a pair of images represents the same part of a log. The similarity between the images is assessed based on the cross-correlation of the convolutional feature maps at one or more levels of the network. The performance of the network is evaluated with two large datasets, containing either spruce or pine logs. The best architecture identifies correctly 99% of the test logs in the spruce dataset and 97% of the test logs in the pine dataset. The results show that the proposed model performs very well in relatively good conditions. The analysis forms a basis for future attempts to utilize deep networks for log identification in challenging real-world forestry applications.
木材鉴定是造林和林业的一项重要工作。它包括将原木相互匹配,并告诉给定标本是已知个体中的哪一个。森林采伐人员可以在森林中砍伐树木时对原木进行成像并评估其质量。识别允许每根原木追溯到它生长的位置,并有效地在锯木厂选择特定质量的原木。在本文中,使用深度双流卷积神经网络来测量一对图像代表日志的同一部分的可能性。图像之间的相似性是基于卷积特征映射在网络的一个或多个层次上的相互关联来评估的。该网络的性能用两个大数据集进行评估,其中包含云杉或松树原木。最好的体系结构可以正确识别云杉数据集中99%的测试日志和松树数据集中97%的测试日志。结果表明,该模型在相对较好的条件下表现良好。该分析为今后在具有挑战性的现实世界林业应用中利用深度网络进行日志识别奠定了基础。
{"title":"Tree Log Identity Matching using Convolutional Correlation Networks","authors":"Mikko Vihlman, Jakke Kulovesi, A. Visala","doi":"10.1109/DICTA47822.2019.8945865","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945865","url":null,"abstract":"Log identification is an important task in silviculture and forestry. It involves matching tree logs with each other and telling which of the known individuals a given specimen is. Forest harvesters can image the logs and assess their quality while cutting trees in the forest. Identification allows each log to be traced back to the location it was grown in and efficiently choosing logs of specific quality in the sawmill. In this paper, a deep two-stream convolutional neural network is used to measure the likelihood that a pair of images represents the same part of a log. The similarity between the images is assessed based on the cross-correlation of the convolutional feature maps at one or more levels of the network. The performance of the network is evaluated with two large datasets, containing either spruce or pine logs. The best architecture identifies correctly 99% of the test logs in the spruce dataset and 97% of the test logs in the pine dataset. The results show that the proposed model performs very well in relatively good conditions. The analysis forms a basis for future attempts to utilize deep networks for log identification in challenging real-world forestry applications.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"25 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77322567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Logical Layout Analysis using Deep Learning 使用深度学习的逻辑布局分析
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946046
Annus Zulfiqar, A. Ul-Hasan, F. Shafait
Logical layout analysis plays an important part in document understanding. It can become a challenging task due to varying formats and layouts. Researchers have proposed different ways to solve this problem, mostly using visual information in some way and a complex pipeline. In this paper, we present a simple technique for labelling the logical structures in document images. We use visual and textual features from the document images to label zones. We utilize Recurrent Neural Networks, specifically 2 layers of LSTM, which input the text from the zone that we want to classify as sequences of words and the normalized position of each word with respect to the page width and height. Comparisons are made by comparing the image under test with the known layouts and labels are assigned to zones accordingly. The labels are abstract, title, author names, and affiliation; however, the text also contains very important information for the task at hand. The presented approach achieved an overall accuracy of 96.21% on publicly available MARG dataset.
逻辑布局分析在文档理解中起着重要作用。由于格式和布局的不同,这可能成为一项具有挑战性的任务。研究人员提出了不同的方法来解决这个问题,主要是通过某种方式使用视觉信息和复杂的管道。在本文中,我们提出了一种简单的技术来标记文档图像中的逻辑结构。我们使用文档图像的视觉和文本特征来标记区域。我们使用递归神经网络,特别是2层LSTM,它从我们想要分类的区域输入文本作为单词序列,以及每个单词相对于页面宽度和高度的规范化位置。通过将待测图像与已知布局进行比较,并相应地将标签分配给区域。标签包括摘要、标题、作者姓名和隶属关系;然而,文本也包含了手头任务的非常重要的信息。该方法在公共MARG数据集上的总体准确率达到96.21%。
{"title":"Logical Layout Analysis using Deep Learning","authors":"Annus Zulfiqar, A. Ul-Hasan, F. Shafait","doi":"10.1109/DICTA47822.2019.8946046","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946046","url":null,"abstract":"Logical layout analysis plays an important part in document understanding. It can become a challenging task due to varying formats and layouts. Researchers have proposed different ways to solve this problem, mostly using visual information in some way and a complex pipeline. In this paper, we present a simple technique for labelling the logical structures in document images. We use visual and textual features from the document images to label zones. We utilize Recurrent Neural Networks, specifically 2 layers of LSTM, which input the text from the zone that we want to classify as sequences of words and the normalized position of each word with respect to the page width and height. Comparisons are made by comparing the image under test with the known layouts and labels are assigned to zones accordingly. The labels are abstract, title, author names, and affiliation; however, the text also contains very important information for the task at hand. The presented approach achieved an overall accuracy of 96.21% on publicly available MARG dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75491786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multimodal Brain Tumour Segmentation using Densely Connected 3D Convolutional Neural Network 基于密集连接三维卷积神经网络的多模态脑肿瘤分割
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946023
M. Ghaffari, A. Sowmya, R. Oliver, Len Hamey
Reliable brain tumour segmentation methods from brain scans are essential for accurate diagnosis and treatment planning. In this paper, we propose a semantic segmentation method based on convolutional neural networks for brain tumour segmentation using multimodal brain scans. The proposed model is a modified version of the well-known U-net architecture. It gains from DenseNet blocks between the encoder and decoder parts of the U-net to transfer more semantic information from the input to the output. In addition, to speed up the training process, we employed deep supervision by adding segmentation blocks at the end of the decoder layers and summing up their outputs to generate the final output of the network. We trained and evaluated our model using the BraTS 2018 dataset. Comparing the results from the proposed model and a generic U-net, our model achieved higher segmentation accuracy in terms of the Dice score.
可靠的脑肿瘤分割方法从脑部扫描是必不可少的准确诊断和治疗计划。在本文中,我们提出了一种基于卷积神经网络的语义分割方法,用于基于多模态脑部扫描的脑肿瘤分割。提出的模型是众所周知的U-net体系结构的改进版本。它从U-net的编码器和解码器部分之间的DenseNet块中获益,从而将更多的语义信息从输入传输到输出。此外,为了加快训练过程,我们采用深度监督,在解码器层的末端添加分割块,并将它们的输出相加,以生成网络的最终输出。我们使用BraTS 2018数据集训练和评估了我们的模型。将所提出的模型与通用U-net的结果进行比较,我们的模型在Dice得分方面取得了更高的分割精度。
{"title":"Multimodal Brain Tumour Segmentation using Densely Connected 3D Convolutional Neural Network","authors":"M. Ghaffari, A. Sowmya, R. Oliver, Len Hamey","doi":"10.1109/DICTA47822.2019.8946023","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946023","url":null,"abstract":"Reliable brain tumour segmentation methods from brain scans are essential for accurate diagnosis and treatment planning. In this paper, we propose a semantic segmentation method based on convolutional neural networks for brain tumour segmentation using multimodal brain scans. The proposed model is a modified version of the well-known U-net architecture. It gains from DenseNet blocks between the encoder and decoder parts of the U-net to transfer more semantic information from the input to the output. In addition, to speed up the training process, we employed deep supervision by adding segmentation blocks at the end of the decoder layers and summing up their outputs to generate the final output of the network. We trained and evaluated our model using the BraTS 2018 dataset. Comparing the results from the proposed model and a generic U-net, our model achieved higher segmentation accuracy in terms of the Dice score.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90159836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Deep Corrosion Assessment for Electrical Transmission Towers 输电塔的深度腐蚀评价
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945905
Teng Zhang, Liangchen Liu, A. Wiliem, Stephen Connor, Zelkjo Ilich, Eddie Van Der Draai, B. Lovell
Galvanised steel transmission towers in electrical power grids suffer from corrosion to different levels depending on age and environment. To ensure the power grid can operate safely, significant resources are spent to monitor the corrosion level of towers. Photographs from helicopters, drones, and a variety of staffs are often used to capture condition, however, these images still need manual inspection to determine the corrosion level before carrying out maintenance works. In this paper, we describe a framework employing multiple deep neural networks based classifiers and detectors to perform automatic image-based condition monitoring for steel transmission towers. Given a random variety of images of a tower, our proposed framework will first determine the location of the image on the structure via a trained zone classifier. Then, fine-grain corrosion inspection will be performed on both fasteners and structural members, respectively. In addition, an automatic zoomin functionality will be applied to images which have high resolution but are a long distance away. This step will ensure the detection performance on small objects on the tower. Finally, the overall corrosion status report for this tower will be calculated and generated automatically. Additionally, we released a subset of our data to contribute to this novel direction. Experiments show that our framework can assess the tower efficiently.
电网镀锌钢输电塔因使用年限和环境的不同,受到不同程度的腐蚀。为了确保电网的安全运行,大量的资源被用于监测塔的腐蚀水平。通常使用直升机、无人机和各种工作人员拍摄的照片来捕捉情况,然而,在进行维护工作之前,这些图像仍然需要人工检查以确定腐蚀程度。在本文中,我们描述了一个框架,利用多个基于深度神经网络的分类器和检测器对钢输电塔进行基于图像的自动状态监测。给定塔楼的随机图像,我们提出的框架将首先通过训练好的区域分类器确定图像在结构上的位置。然后分别对紧固件和结构件进行细粒腐蚀检测。此外,自动变焦功能将应用于高分辨率但距离较远的图像。此步骤将确保对塔上小物体的检测性能。最后,自动计算并生成该塔的整体腐蚀状态报告。此外,我们发布了我们的数据子集,以促进这个新颖的方向。实验表明,该框架能有效地对塔进行评估。
{"title":"Deep Corrosion Assessment for Electrical Transmission Towers","authors":"Teng Zhang, Liangchen Liu, A. Wiliem, Stephen Connor, Zelkjo Ilich, Eddie Van Der Draai, B. Lovell","doi":"10.1109/DICTA47822.2019.8945905","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945905","url":null,"abstract":"Galvanised steel transmission towers in electrical power grids suffer from corrosion to different levels depending on age and environment. To ensure the power grid can operate safely, significant resources are spent to monitor the corrosion level of towers. Photographs from helicopters, drones, and a variety of staffs are often used to capture condition, however, these images still need manual inspection to determine the corrosion level before carrying out maintenance works. In this paper, we describe a framework employing multiple deep neural networks based classifiers and detectors to perform automatic image-based condition monitoring for steel transmission towers. Given a random variety of images of a tower, our proposed framework will first determine the location of the image on the structure via a trained zone classifier. Then, fine-grain corrosion inspection will be performed on both fasteners and structural members, respectively. In addition, an automatic zoomin functionality will be applied to images which have high resolution but are a long distance away. This step will ensure the detection performance on small objects on the tower. Finally, the overall corrosion status report for this tower will be calculated and generated automatically. Additionally, we released a subset of our data to contribute to this novel direction. Experiments show that our framework can assess the tower efficiently.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"71 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88857474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Block Pruning Based on Kernel and Feature Stablization 基于核和特征稳定的高效块剪枝
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946001
Sheng Xu, Hanlin Chen, Kexin Liu, Jinhu Lii, Baochang Zhang
With the development of computer vision research, the architecture of convolutional neural network becomes more and more complex to reach the state-of-the-art performance. Is the complexity of the model necessarily proportional to its accuracy? To answer this, the compression of the network has attracted much attention in the academy and industry. Existing network pruning methods mostly rely on the scoring mechanism of complexity or diversity of kernels to compress the network, and then build the network model after removing the kernels by tuning or training on the input data. These methods are cumbersome and depend on a well-trained pre-trained model. In this paper, we propose an end-to-end block pruning method based on kernel and feature stability by pruning blocks efficiently. To accomplish this, we firstly introduce a mask to scale the output of the blocks, and the L1 regularization term to monitor the mask update. Second, we introduce the Center Loss to guarantee that the feature does not deviate greatly during learning. To converge fast, we introduce fast iterative shrinkage-thresholding algorithm (FISTA) to optimize the mask, by which a more fast and reliable pruning process is achieved. We implement experiments on different datasets, including CIFAR-10 and ImageNet ILSVRC2012. All the experiments have achieved the state-of-the-art accuracy.
随着计算机视觉研究的发展,卷积神经网络的结构越来越复杂,以达到最先进的性能。模型的复杂性是否与它的准确性成正比?为了回答这个问题,网络的压缩已经引起了学术界和工业界的广泛关注。现有的网络剪枝方法大多依靠核的复杂性或多样性的评分机制对网络进行压缩,然后在输入数据上进行调优或训练,去除核后构建网络模型。这些方法很麻烦,并且依赖于训练良好的预训练模型。本文提出了一种基于核和特征稳定性的端到端数据块剪枝方法。为了实现这一点,我们首先引入一个掩码来缩放块的输出,并引入L1正则化项来监控掩码更新。其次,我们引入了中心损失来保证特征在学习过程中不会有很大的偏离。为了快速收敛,我们引入快速迭代收缩阈值算法(FISTA)对掩码进行优化,实现了更快速可靠的剪枝过程。我们在不同的数据集上进行了实验,包括CIFAR-10和ImageNet ILSVRC2012。所有的实验都达到了最先进的精度。
{"title":"Efficient Block Pruning Based on Kernel and Feature Stablization","authors":"Sheng Xu, Hanlin Chen, Kexin Liu, Jinhu Lii, Baochang Zhang","doi":"10.1109/DICTA47822.2019.8946001","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946001","url":null,"abstract":"With the development of computer vision research, the architecture of convolutional neural network becomes more and more complex to reach the state-of-the-art performance. Is the complexity of the model necessarily proportional to its accuracy? To answer this, the compression of the network has attracted much attention in the academy and industry. Existing network pruning methods mostly rely on the scoring mechanism of complexity or diversity of kernels to compress the network, and then build the network model after removing the kernels by tuning or training on the input data. These methods are cumbersome and depend on a well-trained pre-trained model. In this paper, we propose an end-to-end block pruning method based on kernel and feature stability by pruning blocks efficiently. To accomplish this, we firstly introduce a mask to scale the output of the blocks, and the L1 regularization term to monitor the mask update. Second, we introduce the Center Loss to guarantee that the feature does not deviate greatly during learning. To converge fast, we introduce fast iterative shrinkage-thresholding algorithm (FISTA) to optimize the mask, by which a more fast and reliable pruning process is achieved. We implement experiments on different datasets, including CIFAR-10 and ImageNet ILSVRC2012. All the experiments have achieved the state-of-the-art accuracy.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80003359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
OGaze: Gaze Prediction in Egocentric Videos for Attentional Object Selection 凝视:自我中心视频中用于注意对象选择的凝视预测
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945893
Mohammad Al-Naser, Shoaib Ahmed Siddiqui, Hiroki Ohashi, Sheraz Ahmed, Nakamura Katsuyki, Takuto Sato, A. Dengel
This paper proposes a novel gaze-estimation model for attentional object selection tasks. The key features of our model are two-fold: (i) usage of the deformable convolutional layers to better incorporate spatial dependencies of different shapes of objects and background, (ii) formulation of the gaze-estimation problem in two different ways, i.e. as a classification as well as a regression problem. We combine the two different formulations using a joint loss that incorporates both the cross-entropy as well as the mean-squared error in order to train our model. The experimental results on two publicly available datasets indicates that our model not only achieved real-time performance (13–18 FPS), but also outperformed the state-of-the-art models on the OSdataset along with comparable performance on GTEA-plus dataset.
提出了一种新的注意力对象选择任务的注视估计模型。我们的模型的关键特征是双重的:(i)使用可变形卷积层来更好地结合不同形状的物体和背景的空间依赖性,(ii)以两种不同的方式提出凝视估计问题,即作为分类问题和回归问题。为了训练我们的模型,我们使用结合交叉熵和均方误差的联合损失来组合两种不同的公式。在两个公开可用的数据集上的实验结果表明,我们的模型不仅实现了实时性能(13-18 FPS),而且在OSdataset上的性能优于最先进的模型,并且在GTEA-plus数据集上的性能相当。
{"title":"OGaze: Gaze Prediction in Egocentric Videos for Attentional Object Selection","authors":"Mohammad Al-Naser, Shoaib Ahmed Siddiqui, Hiroki Ohashi, Sheraz Ahmed, Nakamura Katsuyki, Takuto Sato, A. Dengel","doi":"10.1109/DICTA47822.2019.8945893","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945893","url":null,"abstract":"This paper proposes a novel gaze-estimation model for attentional object selection tasks. The key features of our model are two-fold: (i) usage of the deformable convolutional layers to better incorporate spatial dependencies of different shapes of objects and background, (ii) formulation of the gaze-estimation problem in two different ways, i.e. as a classification as well as a regression problem. We combine the two different formulations using a joint loss that incorporates both the cross-entropy as well as the mean-squared error in order to train our model. The experimental results on two publicly available datasets indicates that our model not only achieved real-time performance (13–18 FPS), but also outperformed the state-of-the-art models on the OSdataset along with comparable performance on GTEA-plus dataset.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87873428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Indian Sign Language Gesture Recognition using Image Processing and Deep Learning 使用图像处理和深度学习的印度手语手势识别
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945850
N. Bhagat, Y. Vishnusai, G. Rathna
Speech impaired people use hand based gestures to communicate. Unfortunately, the vast majority of the people are not aware of the semantics of these gestures. In a attempt to bridge the same, we propose a real time hand gesture recognition system based on the data captured by the Microsoft Kinect RGB-D camera. Given that there is no one to one mapping between the pixels of the depth and the RGB camera, we used computer vision techniques like 3D contruction and affine transformation. After achieving one to one mapping, segmentation of the hand gestures was done from the background noise. Convolutional Neural Networks (CNNs) were utilised for training 36 static gestures relating to Indian Sign Language (ISL) alphabets and numbers. The model achieved an accuracy of 98.81% on training using 45,000 RGB images and 45,000 depth images. Further Convolutional LSTMs were used for training 10 ISL dynamic word gestures and an accuracy of 99.08% was obtained by training 1080 videos. The model showed accurate real time performance on prediction of ISL static gestures, leaving a scope for further research on sentence formation through gestures. The model also showed competitive adaptability to American Sign Language (ASL) gestures when the ISL models weights were transfer learned to ASL and it resulted in giving 97.71% accuracy.
语言障碍人士使用手势进行交流。不幸的是,绝大多数人都没有意识到这些手势的语义。为了解决这一问题,我们提出了一种基于微软Kinect RGB-D摄像头捕获数据的实时手势识别系统。考虑到深度像素和RGB相机之间没有一对一的映射,我们使用了3D构造和仿射变换等计算机视觉技术。实现一对一映射后,从背景噪声中对手势进行分割。卷积神经网络(cnn)被用于训练36种与印度手语(ISL)字母和数字相关的静态手势。该模型使用45000张RGB图像和45000张深度图像进行训练,准确率达到98.81%。进一步使用卷积lstm对10个ISL动态单词手势进行训练,训练1080个视频,准确率达到99.08%。该模型对ISL静态手势的预测具有准确的实时性,为手势造句的进一步研究留下了空间。当将ISL模型的权重转移到ASL时,该模型对美国手语(ASL)手势也表现出竞争适应性,准确率达到97.71%。
{"title":"Indian Sign Language Gesture Recognition using Image Processing and Deep Learning","authors":"N. Bhagat, Y. Vishnusai, G. Rathna","doi":"10.1109/DICTA47822.2019.8945850","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945850","url":null,"abstract":"Speech impaired people use hand based gestures to communicate. Unfortunately, the vast majority of the people are not aware of the semantics of these gestures. In a attempt to bridge the same, we propose a real time hand gesture recognition system based on the data captured by the Microsoft Kinect RGB-D camera. Given that there is no one to one mapping between the pixels of the depth and the RGB camera, we used computer vision techniques like 3D contruction and affine transformation. After achieving one to one mapping, segmentation of the hand gestures was done from the background noise. Convolutional Neural Networks (CNNs) were utilised for training 36 static gestures relating to Indian Sign Language (ISL) alphabets and numbers. The model achieved an accuracy of 98.81% on training using 45,000 RGB images and 45,000 depth images. Further Convolutional LSTMs were used for training 10 ISL dynamic word gestures and an accuracy of 99.08% was obtained by training 1080 videos. The model showed accurate real time performance on prediction of ISL static gestures, leaving a scope for further research on sentence formation through gestures. The model also showed competitive adaptability to American Sign Language (ASL) gestures when the ISL models weights were transfer learned to ASL and it resulted in giving 97.71% accuracy.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86439365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Radiography Contrast Enhancement: Smoothed LHE Filter a Practical Solution for Digital X-Rays with Mach Band 射线照相对比度增强:平滑的LHE滤波器是马赫波段数字x射线的实用解决方案
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946114
P. Ambalathankandy, Yafei Ou, Jyotsna Kochiyil, Shinya Takamaeda-Yamazaki, M. Motomura, T. Asai, M. Ikebe
In this paper, we analyze and propose the usefulness of smoothed LHE (Local Histogram Equalization) filters for processing images with low contrast like digital radiographic images. Digital X-rays are known to have optical illusions like Mach bands and background contrast effects, which are caused by lateral inhibition phenomena. We observe that using multilayer (ML) methods with latest edge preserving filter for contrast enhancement in medical images can be problematic and could lead to faulty diagnosis from detail exaggeration which are caused by uncontrolled texture boosting from user defined gain settings. ML filters are designed with few subjectively selected filter kernel sizes, which can result in unnaturalness in output images. We propose a smoothed LHE-like filter with an adaptive gain control, that is more robust and can enhance fine details in digital X-rays while maintaining their intrinsic naturalness. Preserving naturalness in X-ray images are an essential feature for radiographic diagnostics. Our proposed filter has 0(1) complexity and can easily be controlled and operated with a continuously varying kernel size, which functions like an active high pass filter, amplifying all frequencies within the kernel.
在本文中,我们分析并提出了平滑LHE(局部直方图均衡化)滤波器在处理低对比度图像(如数字射线图像)中的实用性。众所周知,数字x射线具有马赫波段和背景对比效应等视错觉,这是由横向抑制现象引起的。我们观察到,在医学图像中使用带有最新边缘保持滤波器的多层(ML)方法进行对比度增强可能存在问题,并且可能导致由于用户自定义增益设置不受控制的纹理增强而导致的细节夸大而导致的错误诊断。机器学习滤波器设计时,主观选择的滤波器核大小很少,这可能导致输出图像的不自然。我们提出了一种光滑的lhe类滤波器,具有自适应增益控制,它更鲁棒,可以增强数字x射线中的精细细节,同时保持其固有的自然性。保持x射线图像的自然性是放射学诊断的基本特征。我们提出的滤波器具有0(1)复杂度,并且可以很容易地控制和操作连续变化的核大小,其功能类似于有源高通滤波器,放大核内的所有频率。
{"title":"Radiography Contrast Enhancement: Smoothed LHE Filter a Practical Solution for Digital X-Rays with Mach Band","authors":"P. Ambalathankandy, Yafei Ou, Jyotsna Kochiyil, Shinya Takamaeda-Yamazaki, M. Motomura, T. Asai, M. Ikebe","doi":"10.1109/DICTA47822.2019.8946114","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946114","url":null,"abstract":"In this paper, we analyze and propose the usefulness of smoothed LHE (Local Histogram Equalization) filters for processing images with low contrast like digital radiographic images. Digital X-rays are known to have optical illusions like Mach bands and background contrast effects, which are caused by lateral inhibition phenomena. We observe that using multilayer (ML) methods with latest edge preserving filter for contrast enhancement in medical images can be problematic and could lead to faulty diagnosis from detail exaggeration which are caused by uncontrolled texture boosting from user defined gain settings. ML filters are designed with few subjectively selected filter kernel sizes, which can result in unnaturalness in output images. We propose a smoothed LHE-like filter with an adaptive gain control, that is more robust and can enhance fine details in digital X-rays while maintaining their intrinsic naturalness. Preserving naturalness in X-ray images are an essential feature for radiographic diagnostics. Our proposed filter has 0(1) complexity and can easily be controlled and operated with a continuously varying kernel size, which functions like an active high pass filter, amplifying all frequencies within the kernel.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86201287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Feature Engineering Meets Deep Learning: A Case Study on Table Detection in Documents 特征工程与深度学习:文档中表检测的案例研究
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945929
M. Shahzad, Rabeya Noor, Sheraz Ahmad, A. Mian, F. Shafait
Traditional computer vision approaches heavily relied on hand-crafted features for tasks such as visual object detection and recognition. The recent success of deep learning in automatically extracting representative and powerful features from images has brought a paradigm shift in this area. As a side effect, decades of research into hand-crafted features is considered outdated. In this paper, we present an approach for table detection in which we leverage a deep learning based table detection model with hand-crafted features from a classical table detection method. We demonstrate that by using a suitable encoding of hand-crafted features, the deep learning model is able to perform better at the detection task. Experiments on publicly available UNLV dataset show that the presented method achieves an accuracy comparable with the state-of-the-art deep learning methods without the need of extensive hyper-parameter tuning.
传统的计算机视觉方法严重依赖于手工制作的特征来完成视觉对象检测和识别等任务。最近深度学习在自动从图像中提取代表性和强大特征方面的成功,带来了这一领域的范式转变。副作用是,几十年来对手工制作特征的研究被认为是过时的。在本文中,我们提出了一种表检测方法,其中我们利用基于深度学习的表检测模型,该模型具有来自经典表检测方法的手工制作特征。我们证明,通过使用合适的手工特征编码,深度学习模型能够在检测任务中表现更好。在公开可用的UNLV数据集上的实验表明,该方法在不需要大量超参数调优的情况下,达到了与最先进的深度学习方法相当的精度。
{"title":"Feature Engineering Meets Deep Learning: A Case Study on Table Detection in Documents","authors":"M. Shahzad, Rabeya Noor, Sheraz Ahmad, A. Mian, F. Shafait","doi":"10.1109/DICTA47822.2019.8945929","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945929","url":null,"abstract":"Traditional computer vision approaches heavily relied on hand-crafted features for tasks such as visual object detection and recognition. The recent success of deep learning in automatically extracting representative and powerful features from images has brought a paradigm shift in this area. As a side effect, decades of research into hand-crafted features is considered outdated. In this paper, we present an approach for table detection in which we leverage a deep learning based table detection model with hand-crafted features from a classical table detection method. We demonstrate that by using a suitable encoding of hand-crafted features, the deep learning model is able to perform better at the detection task. Experiments on publicly available UNLV dataset show that the presented method achieves an accuracy comparable with the state-of-the-art deep learning methods without the need of extensive hyper-parameter tuning.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81733879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1