首页 > 最新文献

2019 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Benchmarking Object Detection Networks for Image Based Reference Detection in Document Images 文档图像中基于图像的参考检测基准目标检测网络
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945991
Syed Tahseen Raza Rizvi, Adriano Lucieri, A. Dengel, Sheraz Ahmed
In this paper we study the performance evaluation of state-of-the-art object detection models for the task of bibliographic reference detection from document images. The motivation of evaluating object detection models for the task in hand is inspired from how human perceive a document containing bibliographic references. Humans can easily distinguish between different references just by exploiting the layout with a glimpse of an eye, without understanding the content. Existing state-of-the-art systems for bibliographic reference detection are purely based on textual content. By contrast, we employed four state-of-the art object detection models and compared their performance with state-of-the-art text based reference extraction models. Evaluations are performed on the publicly available dataset (ICONIP) for image based reference detection, containing 455 scanned bibliographic documents with 8766 references from Social Sciences books and journals. Evaluation results reveal the superiority of image based methods for the task of reference detection in document images.
在本文中,我们研究了最先进的目标检测模型的性能评估,用于从文档图像中检测书目参考文献。评估手头任务的对象检测模型的动机来自于人类如何感知包含书目参考文献的文档。人们可以很容易地区分不同的引用,只是利用布局与一瞥的眼睛,而不理解内容。现有最先进的书目参考检测系统纯粹基于文本内容。相比之下,我们采用了四种最先进的对象检测模型,并将它们的性能与最先进的基于文本的参考提取模型进行了比较。对基于图像的参考文献检测的公开可用数据集(ICONIP)进行评估,该数据集包含455个扫描书目文档和来自社会科学书籍和期刊的8766个参考文献。评价结果显示了基于图像的方法在文档图像参考检测任务中的优越性。
{"title":"Benchmarking Object Detection Networks for Image Based Reference Detection in Document Images","authors":"Syed Tahseen Raza Rizvi, Adriano Lucieri, A. Dengel, Sheraz Ahmed","doi":"10.1109/DICTA47822.2019.8945991","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945991","url":null,"abstract":"In this paper we study the performance evaluation of state-of-the-art object detection models for the task of bibliographic reference detection from document images. The motivation of evaluating object detection models for the task in hand is inspired from how human perceive a document containing bibliographic references. Humans can easily distinguish between different references just by exploiting the layout with a glimpse of an eye, without understanding the content. Existing state-of-the-art systems for bibliographic reference detection are purely based on textual content. By contrast, we employed four state-of-the art object detection models and compared their performance with state-of-the-art text based reference extraction models. Evaluations are performed on the publicly available dataset (ICONIP) for image based reference detection, containing 455 scanned bibliographic documents with 8766 references from Social Sciences books and journals. Evaluation results reveal the superiority of image based methods for the task of reference detection in document images.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"27 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80204281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Reading Meter Numbers in the Wild 在野外读取仪表数字
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945969
Alessandro Calefati, I. Gallo, Shah Nawaz
In this work we introduce a pipeline to detect and recognize various utility meter numbers in the wild. The system leverages on deep neural networks for detection and recognition. In the detection phase, we employ a fully Convolutional Neural Network to perform a pixel-wise classification, while the recognition phase employs another deep neural network to predict the length and individual digits in a meter. We qualitatively showed that the proposed approach is robust against severe perspective distortions, different lighting conditions and blurred images. Furthermore, it is capable of detecting small scale digits. Our approach is suitable for billing companies aiming to increase efficiency, lowering the time consumed by manual checks performed in the billing process. Finally, we release the dataset used in this work to benchmark the task.
在这项工作中,我们介绍了一个管道来检测和识别各种电能表号码。该系统利用深度神经网络进行检测和识别。在检测阶段,我们使用全卷积神经网络来执行逐像素分类,而识别阶段使用另一个深度神经网络来预测米的长度和单个数字。我们定性地表明,所提出的方法对严重的透视失真,不同的照明条件和模糊的图像是鲁棒的。此外,它能够检测小尺度数字。我们的方法适用于旨在提高效率的计费公司,减少在计费过程中执行的手动检查所消耗的时间。最后,我们发布了本工作中使用的数据集来对任务进行基准测试。
{"title":"Reading Meter Numbers in the Wild","authors":"Alessandro Calefati, I. Gallo, Shah Nawaz","doi":"10.1109/DICTA47822.2019.8945969","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945969","url":null,"abstract":"In this work we introduce a pipeline to detect and recognize various utility meter numbers in the wild. The system leverages on deep neural networks for detection and recognition. In the detection phase, we employ a fully Convolutional Neural Network to perform a pixel-wise classification, while the recognition phase employs another deep neural network to predict the length and individual digits in a meter. We qualitatively showed that the proposed approach is robust against severe perspective distortions, different lighting conditions and blurred images. Furthermore, it is capable of detecting small scale digits. Our approach is suitable for billing companies aiming to increase efficiency, lowering the time consumed by manual checks performed in the billing process. Finally, we release the dataset used in this work to benchmark the task.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75342258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Runway Detection and Localization in Aerial Images using Deep Learning 基于深度学习的航拍图像跑道检测与定位
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945889
Javeria Akbar, M. Shahzad, M. I. Malik, A. Ul-Hasan, F. Shafait
Landing is the most difficult phase of the flight for any airborne platform. Due to lack of efficient systems, there have been numerous landing accidents resulting in the damage of onboard hardware. Vision based systems provides low cost solution to detect landing sites by providing rich textual information. To this end, this research focuses on accurate detection and localization of runways in aerial images with untidy terrains which would consequently help aerial platforms especially Unmanned Aerial Vehicles (commonly referred to as Drones) to detect landing targets (i.e., runways) to aid automatic landing. Most of the prior work regarding runway detection is based on simple image processing algorithms with lot of assumptions and constraints about precise position of runway in a particular image. First part of this research is to develop runway detection algorithm based on state-of-the-art deep learning architectures while the second part is runway localization using both deep learning and non-deep learning based methods. The proposed runway detection approach is two-stage modular where in the first stage the aerial image classification is achieved to find the existence of runway in that particular image. Later, in the second stage, the identified runways are localized using both conventional line detection algorithms and more recent deep learning models. The runway classification has been achieved with an accuracy of around 97% whereas the runways have been localized with mean Intersection-over-Union (IoU) score of 0.8.
着陆是任何空中平台飞行中最困难的阶段。由于缺乏有效的系统,已经有许多着陆事故导致机上硬件损坏。基于视觉的系统通过提供丰富的文本信息为探测着陆点提供了低成本的解决方案。为此,本研究的重点是在地形不整齐的航拍图像中对跑道进行准确的检测和定位,从而帮助空中平台,特别是无人机(通常简称无人机)探测着陆目标(即跑道),辅助自动着陆。之前关于跑道检测的大部分工作都是基于简单的图像处理算法,对特定图像中跑道的精确位置有大量的假设和约束。本研究的第一部分是基于最先进的深度学习架构开发跑道检测算法,第二部分是使用深度学习和非深度学习方法进行跑道定位。本文提出的跑道检测方法是两阶段模块化的,第一阶段对航拍图像进行分类,找出特定图像中是否存在跑道。随后,在第二阶段,使用传统的线路检测算法和最新的深度学习模型对已识别的跑道进行定位。跑道分类的准确率达到了97%左右,而跑道的定位平均IoU分数为0.8。
{"title":"Runway Detection and Localization in Aerial Images using Deep Learning","authors":"Javeria Akbar, M. Shahzad, M. I. Malik, A. Ul-Hasan, F. Shafait","doi":"10.1109/DICTA47822.2019.8945889","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945889","url":null,"abstract":"Landing is the most difficult phase of the flight for any airborne platform. Due to lack of efficient systems, there have been numerous landing accidents resulting in the damage of onboard hardware. Vision based systems provides low cost solution to detect landing sites by providing rich textual information. To this end, this research focuses on accurate detection and localization of runways in aerial images with untidy terrains which would consequently help aerial platforms especially Unmanned Aerial Vehicles (commonly referred to as Drones) to detect landing targets (i.e., runways) to aid automatic landing. Most of the prior work regarding runway detection is based on simple image processing algorithms with lot of assumptions and constraints about precise position of runway in a particular image. First part of this research is to develop runway detection algorithm based on state-of-the-art deep learning architectures while the second part is runway localization using both deep learning and non-deep learning based methods. The proposed runway detection approach is two-stage modular where in the first stage the aerial image classification is achieved to find the existence of runway in that particular image. Later, in the second stage, the identified runways are localized using both conventional line detection algorithms and more recent deep learning models. The runway classification has been achieved with an accuracy of around 97% whereas the runways have been localized with mean Intersection-over-Union (IoU) score of 0.8.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"90 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76054419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Enhanced Micro Target Detection through Local Motion Feedback in Biologically Inspired Algorithms 基于生物启发算法的局部运动反馈增强微目标检测
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945846
A. Melville-Smith, A. Finn, R. Brinkworth
Looking for micro targets (objects in the range of 1.2×1.2 pixels) that are moving in electro-optic imagery is a relatively simple task when the background is perfectly still. Once motion is introduced into the background, such as movement from trees and bushes or ego-motion induced by a moving platform, the task becomes much more difficult. Flies have a method of dealing with such motion while still being able to detect small moving targets. This paper takes an existing model based on the fly's early visual systems and compares it to existing methods of target detection. High dynamic range imagery is used and motion induced to reflect the effects of a rotating platform. The model of the fly's visual system is then enhanced to include local area motion feedback to help separate the moving background from moving targets in cluttered scenes. This feedback increases the performance of the system, showing a general improvement of over 80% from the baseline model, and 30 times better performance than the pixel-based adaptive segmenter and local contrast methods. These results indicate the enhanced model is able to perform micro target detection with better discrimination between targets and the background in cluttered scenes from a moving platform.
在背景完全静止的情况下,寻找电光图像中移动的微目标(1.2×1.2像素范围内的物体)是一项相对简单的任务。一旦在背景中加入运动,例如来自树木和灌木丛的运动或移动平台引起的自我运动,任务就会变得更加困难。苍蝇有一种处理这种运动的方法,同时仍然能够探测到小的移动目标。本文采用基于果蝇早期视觉系统的现有模型,并将其与现有的目标检测方法进行比较。采用高动态范围图像和运动诱导来反映旋转平台的影响。然后,苍蝇的视觉系统模型被增强,包括局部运动反馈,以帮助在混乱的场景中区分运动背景和运动目标。这种反馈提高了系统的性能,显示出比基线模型总体上提高了80%以上,比基于像素的自适应分割和局部对比方法的性能好30倍。实验结果表明,该模型能够在移动平台的杂乱场景中进行微目标检测,并能较好地区分目标和背景。
{"title":"Enhanced Micro Target Detection through Local Motion Feedback in Biologically Inspired Algorithms","authors":"A. Melville-Smith, A. Finn, R. Brinkworth","doi":"10.1109/DICTA47822.2019.8945846","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945846","url":null,"abstract":"Looking for micro targets (objects in the range of 1.2×1.2 pixels) that are moving in electro-optic imagery is a relatively simple task when the background is perfectly still. Once motion is introduced into the background, such as movement from trees and bushes or ego-motion induced by a moving platform, the task becomes much more difficult. Flies have a method of dealing with such motion while still being able to detect small moving targets. This paper takes an existing model based on the fly's early visual systems and compares it to existing methods of target detection. High dynamic range imagery is used and motion induced to reflect the effects of a rotating platform. The model of the fly's visual system is then enhanced to include local area motion feedback to help separate the moving background from moving targets in cluttered scenes. This feedback increases the performance of the system, showing a general improvement of over 80% from the baseline model, and 30 times better performance than the pixel-based adaptive segmenter and local contrast methods. These results indicate the enhanced model is able to perform micro target detection with better discrimination between targets and the background in cluttered scenes from a moving platform.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"30 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73064802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Detection of Central Retinal Vein Occlusion using Guided Salient Features 利用引导显著特征检测视网膜中央静脉阻塞
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945864
N. Rajapaksha, L. Ranathunga, K. Bandara
Central Retinal Vein Occlusion (CRVO) is one of the most common retinal vascular disorders in the world. Since it causes sudden permanent vision loss, it is crucial to detect and treat CRVO immediately to avoid further vision deterioration. But, manually diagnosing CRVO is a time-consuming task which requires constant attention of ophthalmologist. Although there are considerable number of research have been conducted in detecting Branch Retinal Occlusion, there are only few approaches have been proposed to identify CRVO automatically using fundus images. Multiple other approaches have been proposed to detect symptoms of CRVO such as hemorrhages, macular oedema separately. This paper has proposed guided salient feature-based approach to detect CRVO automatically. Overall system has achieved 89.04% accuracy with 72.5% precision and 70.73% recall. Furthermore, it has introduced novel approaches to detect retinal hemorrhages and optic disc. As it is a guided framework based on expert knowledge, this study eliminates the haziness presents in probabilistic feature selection approaches.
视网膜中央静脉阻塞(CRVO)是世界上最常见的视网膜血管疾病之一。由于CRVO会导致突然的永久性视力丧失,因此及时发现和治疗CRVO是避免视力进一步恶化的关键。但是,人工诊断CRVO是一项耗时的任务,需要眼科医生的持续关注。虽然在检测视网膜分支闭塞方面已经进行了相当多的研究,但利用眼底图像自动识别视网膜分支闭塞的方法很少。已经提出了多种其他方法来分别检测CRVO的症状,如出血、黄斑水肿。本文提出了一种基于引导显著特征的CRVO自动检测方法。整个系统的准确率为89.04%,精密度为72.5%,召回率为70.73%。此外,它还介绍了检测视网膜出血和视盘的新方法。由于它是一个基于专家知识的指导框架,该研究消除了概率特征选择方法的模糊性。
{"title":"Detection of Central Retinal Vein Occlusion using Guided Salient Features","authors":"N. Rajapaksha, L. Ranathunga, K. Bandara","doi":"10.1109/DICTA47822.2019.8945864","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945864","url":null,"abstract":"Central Retinal Vein Occlusion (CRVO) is one of the most common retinal vascular disorders in the world. Since it causes sudden permanent vision loss, it is crucial to detect and treat CRVO immediately to avoid further vision deterioration. But, manually diagnosing CRVO is a time-consuming task which requires constant attention of ophthalmologist. Although there are considerable number of research have been conducted in detecting Branch Retinal Occlusion, there are only few approaches have been proposed to identify CRVO automatically using fundus images. Multiple other approaches have been proposed to detect symptoms of CRVO such as hemorrhages, macular oedema separately. This paper has proposed guided salient feature-based approach to detect CRVO automatically. Overall system has achieved 89.04% accuracy with 72.5% precision and 70.73% recall. Furthermore, it has introduced novel approaches to detect retinal hemorrhages and optic disc. As it is a guided framework based on expert knowledge, this study eliminates the haziness presents in probabilistic feature selection approaches.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"124 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75806261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Historical Document Text Binarization using Atrous Convolution and Multi-Scale Feature Decoder 基于自然卷积和多尺度特征解码器的历史文献文本二值化
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946108
Hanif Rasyidi, S. Khan
This paper presents a segmentation-based binarization model to extract text information from the historical document using convolutional neural networks. The proposed method uses atrous convolution feature extraction to learn useful text pattern from the document without making a significant reduction on the spatial size of the image. The model then combines the extracted feature using a multi-scale decoder to construct a binary image that contains only text information from the document. We train our model using a series of DIBCO competition datasets and compare the results with the existing text binarization methods as well as a state-of-the-art object segmentation model. The experiment results on the H-DIBCO 2016 dataset show that our method has an excellent performance on the pseudo F-Score metric that surpasses the result of various existing methods.
本文提出了一种基于分割的二值化模型,利用卷积神经网络从历史文档中提取文本信息。该方法在不显著减小图像空间大小的情况下,利用亚历克斯卷积特征提取从文档中学习有用的文本模式。然后,该模型使用多尺度解码器组合提取的特征,以构建仅包含文档文本信息的二值图像。我们使用一系列DIBCO竞争数据集训练我们的模型,并将结果与现有的文本二值化方法以及最先进的目标分割模型进行比较。在H-DIBCO 2016数据集上的实验结果表明,我们的方法在伪F-Score指标上的性能优于现有的各种方法。
{"title":"Historical Document Text Binarization using Atrous Convolution and Multi-Scale Feature Decoder","authors":"Hanif Rasyidi, S. Khan","doi":"10.1109/DICTA47822.2019.8946108","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946108","url":null,"abstract":"This paper presents a segmentation-based binarization model to extract text information from the historical document using convolutional neural networks. The proposed method uses atrous convolution feature extraction to learn useful text pattern from the document without making a significant reduction on the spatial size of the image. The model then combines the extracted feature using a multi-scale decoder to construct a binary image that contains only text information from the document. We train our model using a series of DIBCO competition datasets and compare the results with the existing text binarization methods as well as a state-of-the-art object segmentation model. The experiment results on the H-DIBCO 2016 dataset show that our method has an excellent performance on the pseudo F-Score metric that surpasses the result of various existing methods.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"163 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78606315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Style-Transfer to Understand Material Classification for Robotic Sorting of Recycled Beverage Containers 用风格迁移理解回收饮料容器机器人分拣中的物料分类
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945993
M. McDonnell, Bahar Moezzi, R. Brinkworth
Robotic sorting machines are increasingly being investigated for use in recycling centers. We consider the problem of automatically classifying images of recycled beverage containers by material type, i.e. glass, plastic, metal or liquid-packaging-board, when the containers are not in their original condition, meaning their shape and size may be deformed, and coloring and packaging labels may be damaged or dirty. We describe a retrofitted computer vision system and deep convolutional neural network classifier designed for this purpose, that enabled a sorting machine's accuracy and speed to reach commercially viable benchmarks. We investigate what was more important for highly accurate container material recognition: shape, size, color, texture or all of these? To help answer this question, we made use of style-transfer methods from the field of deep learning. We found that removing either texture or shape cues significantly reduced the accuracy in container material classification, while removing color had a minor negative effect. Unlike recent work on generic objects in ImageNet, networks trained to classify by container material type learned better from object shape than texture. Our findings show that commercial sorting of recycled beverage containers by material type at high accuracy is feasible, even when the containers are in poor condition. Furthermore, we reinforce the recent finding that convolutional neural networks can learn predominantly either from texture cues or shape.
越来越多的人正在研究在回收中心使用机器人分拣机。我们考虑的问题是,当容器不在其原始状态时,即容器的形状和尺寸可能会变形,颜色和包装标签可能会损坏或脏时,根据材料类型(即玻璃,塑料,金属或液体包装板)对回收饮料容器的图像进行自动分类。我们描述了为此目的而设计的改进的计算机视觉系统和深度卷积神经网络分类器,使分选机的准确性和速度达到商业上可行的基准。我们研究了对于高度精确的容器材料识别来说,哪个更重要:形状、大小、颜色、纹理还是所有这些?为了帮助回答这个问题,我们使用了深度学习领域的风格迁移方法。我们发现,去除纹理或形状线索显著降低了容器材料分类的准确性,而去除颜色则有轻微的负面影响。与ImageNet中最近对通用对象的研究不同,通过容器材料类型进行分类的网络从物体形状而不是纹理中学习得更好。我们的研究结果表明,即使在容器状况不佳的情况下,按材料类型对回收饮料容器进行高精度的商业分类也是可行的。此外,我们强化了最近的发现,即卷积神经网络可以主要从纹理线索或形状中学习。
{"title":"Using Style-Transfer to Understand Material Classification for Robotic Sorting of Recycled Beverage Containers","authors":"M. McDonnell, Bahar Moezzi, R. Brinkworth","doi":"10.1109/DICTA47822.2019.8945993","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945993","url":null,"abstract":"Robotic sorting machines are increasingly being investigated for use in recycling centers. We consider the problem of automatically classifying images of recycled beverage containers by material type, i.e. glass, plastic, metal or liquid-packaging-board, when the containers are not in their original condition, meaning their shape and size may be deformed, and coloring and packaging labels may be damaged or dirty. We describe a retrofitted computer vision system and deep convolutional neural network classifier designed for this purpose, that enabled a sorting machine's accuracy and speed to reach commercially viable benchmarks. We investigate what was more important for highly accurate container material recognition: shape, size, color, texture or all of these? To help answer this question, we made use of style-transfer methods from the field of deep learning. We found that removing either texture or shape cues significantly reduced the accuracy in container material classification, while removing color had a minor negative effect. Unlike recent work on generic objects in ImageNet, networks trained to classify by container material type learned better from object shape than texture. Our findings show that commercial sorting of recycled beverage containers by material type at high accuracy is feasible, even when the containers are in poor condition. Furthermore, we reinforce the recent finding that convolutional neural networks can learn predominantly either from texture cues or shape.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"18 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81724979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Facial-Expression Recognition from Video using Enhanced Convolutional LSTM 基于增强卷积LSTM的视频面部表情识别
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946025
Ryo Miyoshi, N. Nagata, M. Hashimoto
We propose an enhanced convolutional long short-term memory (ConvLSTM) algorithm, i.e., Enhanced ConvLSTM, by adding skip connections in the spatial and temporal directions to conventional ConvLSTM to suppress gradient vanishing and use older information. We also propose a method that uses this algorithm to automatically recognize facial expressions from videos. The proposed facial-expression recognition method consists of two Enhanced ConvLSTM streams and two ResNet streams. The Enhanced ConvLSTM streams extract features for fine movements, and the ResNet streams extract features for rough movements. In the Enhanced ConvLSTM streams, spatio-temporal features are extracted by stacking the Enhanced ConvLSTM. We conducted experiments to compare a method using ConvLSTM with skip connections (proposed Enhanced ConvLSTM) and a method without them (conventional ConvLSTM). A method using Enhanced CovnLSTM had a 4.44% higher accuracy than the a method using conventional ConvL-STM. Also the proposed facial-expression recognition method achieved 45.29% accuracy, which is 2.31% higher than that of the conventional method.
我们提出了一种增强的卷积长短期记忆(ConvLSTM)算法,即增强的ConvLSTM,通过在传统的ConvLSTM上增加空间和时间方向上的跳跃连接来抑制梯度消失并使用旧信息。我们还提出了一种利用该算法从视频中自动识别面部表情的方法。所提出的面部表情识别方法由两个增强的ConvLSTM流和两个ResNet流组成。增强的ConvLSTM流提取精细运动的特征,而ResNet流提取粗糙运动的特征。在增强的ConvLSTM流中,通过叠加增强的ConvLSTM提取时空特征。我们进行了实验,比较了使用带有跳过连接的ConvLSTM方法(提出的增强型ConvLSTM)和没有跳过连接的方法(传统的ConvLSTM)。与传统的ConvL-STM方法相比,采用Enhanced CovnLSTM方法的准确率提高了4.44%。人脸识别准确率达到45.29%,比传统方法提高2.31%。
{"title":"Facial-Expression Recognition from Video using Enhanced Convolutional LSTM","authors":"Ryo Miyoshi, N. Nagata, M. Hashimoto","doi":"10.1109/DICTA47822.2019.8946025","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946025","url":null,"abstract":"We propose an enhanced convolutional long short-term memory (ConvLSTM) algorithm, i.e., Enhanced ConvLSTM, by adding skip connections in the spatial and temporal directions to conventional ConvLSTM to suppress gradient vanishing and use older information. We also propose a method that uses this algorithm to automatically recognize facial expressions from videos. The proposed facial-expression recognition method consists of two Enhanced ConvLSTM streams and two ResNet streams. The Enhanced ConvLSTM streams extract features for fine movements, and the ResNet streams extract features for rough movements. In the Enhanced ConvLSTM streams, spatio-temporal features are extracted by stacking the Enhanced ConvLSTM. We conducted experiments to compare a method using ConvLSTM with skip connections (proposed Enhanced ConvLSTM) and a method without them (conventional ConvLSTM). A method using Enhanced CovnLSTM had a 4.44% higher accuracy than the a method using conventional ConvL-STM. Also the proposed facial-expression recognition method achieved 45.29% accuracy, which is 2.31% higher than that of the conventional method.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86725191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Flood Detection in Social Media Images using Visual Features and Metadata 基于视觉特征和元数据的社交媒体图像洪水检测
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946007
R. Jony, A. Woodley, Dimitri Perrin
Images uploaded to social media platforms such as Twitter and Flickr have become a potential source of information about natural disasters. However, due to their lower reliability and noisy nature, it is challenging to automatically identify social media images that genuinely contain evidence of natural disasters. Visual features have been popular for classifying these images while the associated metadata are often ignored or exploited only to a limited extent. To test their potential, we employed them separately to identify social media images with flooding evidence. For visual feature extraction, we utilized three advanced Convolutional Neural Networks (CNNs) pre-trained on two different types of datasets and used a simple neural network for the classification. The results demonstrate that the combination of two types of visual features have a positive impact on distinguishing natural disaster images. From metadata, we considered only the textual metadata. Here, we combined all textual metadata and extracted bi-gram features. Then we employed a Support Vector Machine (SVM) for the classification task. The results show that the combination of the textual metadata can improve the classification accuracy compared to their individual counterparts. The results also demonstrate that although the visual feature approach outperforms metadata approach, metadata have certain capability to classify these images. For instance, the proposed visual feature approach achieved a similar result (MAP = 95.15) compared to the top visual feature approaches presented in MediaEval 2017, the metadata approach outperformed (MAP = 84.52) presented metadata methods. For the experiments, we utilized dataset from MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM) task and compared the achieved results with the other methods presented (Number of participants = 11) of the task.
上传到Twitter和Flickr等社交媒体平台上的图片已经成为有关自然灾害的潜在信息来源。然而,由于其可靠性较低和嘈杂的性质,自动识别真正包含自然灾害证据的社交媒体图像具有挑战性。视觉特征被广泛用于对这些图像进行分类,而相关的元数据往往被忽略或只在有限的程度上被利用。为了测试它们的潜力,我们分别使用它们来识别具有大量证据的社交媒体图像。对于视觉特征提取,我们使用了三个先进的卷积神经网络(cnn)在两种不同类型的数据集上进行预训练,并使用一个简单的神经网络进行分类。结果表明,两类视觉特征的结合对自然灾害图像的识别有积极的影响。从元数据来看,我们只考虑了文本元数据。在这里,我们结合了所有文本元数据并提取了双元图特征。然后我们使用支持向量机(SVM)进行分类任务。结果表明,文本元数据的组合比单独的文本元数据更能提高分类精度。结果还表明,虽然视觉特征方法优于元数据方法,但元数据对这些图像具有一定的分类能力。例如,与MediaEval 2017中提出的顶级视觉特征方法相比,所提出的视觉特征方法取得了相似的结果(MAP = 95.15),而元数据方法的表现优于所提出的元数据方法(MAP = 84.52)。实验中,我们使用MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM)任务的数据集,并将所获得的结果与该任务的其他方法(参与者数= 11)进行比较。
{"title":"Flood Detection in Social Media Images using Visual Features and Metadata","authors":"R. Jony, A. Woodley, Dimitri Perrin","doi":"10.1109/DICTA47822.2019.8946007","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946007","url":null,"abstract":"Images uploaded to social media platforms such as Twitter and Flickr have become a potential source of information about natural disasters. However, due to their lower reliability and noisy nature, it is challenging to automatically identify social media images that genuinely contain evidence of natural disasters. Visual features have been popular for classifying these images while the associated metadata are often ignored or exploited only to a limited extent. To test their potential, we employed them separately to identify social media images with flooding evidence. For visual feature extraction, we utilized three advanced Convolutional Neural Networks (CNNs) pre-trained on two different types of datasets and used a simple neural network for the classification. The results demonstrate that the combination of two types of visual features have a positive impact on distinguishing natural disaster images. From metadata, we considered only the textual metadata. Here, we combined all textual metadata and extracted bi-gram features. Then we employed a Support Vector Machine (SVM) for the classification task. The results show that the combination of the textual metadata can improve the classification accuracy compared to their individual counterparts. The results also demonstrate that although the visual feature approach outperforms metadata approach, metadata have certain capability to classify these images. For instance, the proposed visual feature approach achieved a similar result (MAP = 95.15) compared to the top visual feature approaches presented in MediaEval 2017, the metadata approach outperformed (MAP = 84.52) presented metadata methods. For the experiments, we utilized dataset from MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM) task and compared the achieved results with the other methods presented (Number of participants = 11) of the task.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"72 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84017791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Assessment and Elimination of Inflammatory Cell: A Machine Learning Approach in Digital Cytology 评估和消除炎症细胞:数字细胞学中的机器学习方法
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946065
Jing Ke, Junwei Deng, Yizhou Lu, Dadong Wang, Yang Song, Huijuan Zhang
In automatic cytology image diagnosis, the false-positive or false-negative often come up with inflammatory cells that obscure the identification of abnormal or normal cells. These phenotypes are presented in the similar appearance in shape, color and texture with cells to detect. In this paper, to evaluate the inflammation and eliminate their disturbances of recognizing cells of interests, we propose a two-stage framework containing a deep learning based neural network to detect and estimate the proportions of inflammatory cells, and a morphology based image processing architecture to eliminate them from the digital images with image inpainting. For performance evaluation, we apply the framework to our collected real-life clinical cytology slides presented with a variety of complexities. We evaluate the tests on sub-images cropped from 49 positive and 49 negative slides from different patients, each at the magnification rate of 40×. The experiments shows an accurate profile of the coverage of inflammation in the whole slide images, as well as their proportion in all the cells presented in the image. Confirmed by cytotechnologists, more than 96.0% of inflammatory cells are successfully detected at pixel level and well-inpainted in the cytology images without bringing new recognition problem.
在细胞学图像自动诊断中,假阳性或假阴性常伴有炎性细胞,使异常或正常细胞的鉴别模糊不清。这些表型在形状、颜色和纹理上呈现出与细胞相似的外观。在本文中,为了评估炎症并消除它们对识别感兴趣的细胞的干扰,我们提出了一个两阶段框架,其中包含一个基于深度学习的神经网络来检测和估计炎症细胞的比例,以及一个基于形态学的图像处理架构来从带有图像喷漆的数字图像中消除炎症细胞。为了进行性能评估,我们将该框架应用于我们收集的具有各种复杂性的真实临床细胞学幻灯片。我们对来自不同患者的49张阳性和49张阴性载玻片的亚图像进行了测试,每张载玻片的放大倍率为40倍。实验显示了整个幻灯片图像中炎症覆盖的准确概况,以及它们在图像中呈现的所有细胞中的比例。细胞技术专家证实,超过96.0%的炎症细胞在像素水平上被成功检测到,并且在细胞学图像中被很好地绘制,没有带来新的识别问题。
{"title":"Assessment and Elimination of Inflammatory Cell: A Machine Learning Approach in Digital Cytology","authors":"Jing Ke, Junwei Deng, Yizhou Lu, Dadong Wang, Yang Song, Huijuan Zhang","doi":"10.1109/DICTA47822.2019.8946065","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946065","url":null,"abstract":"In automatic cytology image diagnosis, the false-positive or false-negative often come up with inflammatory cells that obscure the identification of abnormal or normal cells. These phenotypes are presented in the similar appearance in shape, color and texture with cells to detect. In this paper, to evaluate the inflammation and eliminate their disturbances of recognizing cells of interests, we propose a two-stage framework containing a deep learning based neural network to detect and estimate the proportions of inflammatory cells, and a morphology based image processing architecture to eliminate them from the digital images with image inpainting. For performance evaluation, we apply the framework to our collected real-life clinical cytology slides presented with a variety of complexities. We evaluate the tests on sub-images cropped from 49 positive and 49 negative slides from different patients, each at the magnification rate of 40×. The experiments shows an accurate profile of the coverage of inflammation in the whole slide images, as well as their proportion in all the cells presented in the image. Confirmed by cytotechnologists, more than 96.0% of inflammatory cells are successfully detected at pixel level and well-inpainted in the cytology images without bringing new recognition problem.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"20 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84075531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1