首页 > 最新文献

2019 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
From Chest X-Rays to Radiology Reports: A Multimodal Machine Learning Approach 从胸部x光到放射学报告:一种多模式机器学习方法
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945819
Sonit Singh, Sarvnaz Karimi, K. Ho-Shon, Len Hamey
Interpreting medical images and summarising them in the form of radiology reports is a challenging, tedious, and complex task. A radiologist provides a complete description of a medical image in the form of radiology report by describing normal or abnormal findings and providing a summary for decision making. Research shows that the radiology practice is error-prone due to the limited number of experts, increasing patient volumes, and the subjective nature of human perception. To reduce the number of diagnostic errors and to alleviate the task of radiologists, there is a need for a computer-aided report generation system that can automatically generate a radiology report for a given medical image. We propose an encoder-decoder based framework that can automatically generate radiology reports from medical images. Specifically, we use a Convolutional Neural Network as an encoder coupled with a multi-stage Stacked Long Short-Term Memory as a decoder to generate reports. We perform experiments on the Indiana University Chest X-ray collection, a publicly available dataset, to measure the effectiveness of our model. Experimental results show the effectiveness of our model in automatically generating radiology reports from medical images.
解释医学图像并以放射学报告的形式总结它们是一项具有挑战性、乏味和复杂的任务。放射科医生通过描述正常或异常的发现,并为决策提供总结,以放射学报告的形式提供对医学图像的完整描述。研究表明,由于专家数量有限,患者数量增加以及人类感知的主观性,放射学实践容易出错。为了减少诊断错误的数量并减轻放射科医生的工作,需要一种计算机辅助报告生成系统,该系统可以为给定的医学图像自动生成放射学报告。我们提出了一个基于编码器-解码器的框架,可以从医学图像中自动生成放射学报告。具体来说,我们使用卷积神经网络作为编码器,加上多级堆叠长短期记忆作为解码器来生成报告。我们在印第安纳大学的胸部x光数据集(一个公开可用的数据集)上进行实验,以衡量我们模型的有效性。实验结果表明,该模型能够有效地从医学图像中自动生成放射学报告。
{"title":"From Chest X-Rays to Radiology Reports: A Multimodal Machine Learning Approach","authors":"Sonit Singh, Sarvnaz Karimi, K. Ho-Shon, Len Hamey","doi":"10.1109/DICTA47822.2019.8945819","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945819","url":null,"abstract":"Interpreting medical images and summarising them in the form of radiology reports is a challenging, tedious, and complex task. A radiologist provides a complete description of a medical image in the form of radiology report by describing normal or abnormal findings and providing a summary for decision making. Research shows that the radiology practice is error-prone due to the limited number of experts, increasing patient volumes, and the subjective nature of human perception. To reduce the number of diagnostic errors and to alleviate the task of radiologists, there is a need for a computer-aided report generation system that can automatically generate a radiology report for a given medical image. We propose an encoder-decoder based framework that can automatically generate radiology reports from medical images. Specifically, we use a Convolutional Neural Network as an encoder coupled with a multi-stage Stacked Long Short-Term Memory as a decoder to generate reports. We perform experiments on the Indiana University Chest X-ray collection, a publicly available dataset, to measure the effectiveness of our model. Experimental results show the effectiveness of our model in automatically generating radiology reports from medical images.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84169788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Social Network Analysis of an Acoustic Environment: The Use of Visualised Data to Characterise Natural Habitats 声环境的社会网络分析:使用可视化数据来描述自然栖息地
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945963
Junling Wang, M. Sankupellay, D. Konovalov, M. Towsey, P. Roe
(1) Background: Ecologists use acoustic recordings for long term environmental monitoring. However, as audio recordings are opaque, obtaining meaningful information from them is a challenging task. Calculating summary indices from recordings is one way to reduce the size of audio data, but the amount of information of summary indices is still too big. (2) Method: In this study we explore the application of social network analysis to visually and quantitatively model acoustic changes. To achieve our aim, we clustered summary indices using two algorithms, and the results were used to generate network maps. (3) Results and Discussion: The network maps allowed us to visually perceive acoustic changes in a day and to visually compare one day to another. To enable quantitative comparison, we also calculated summary values from the social network maps, including Gini coefficient (an economical concept adopted to estimate how unevenly the occurrences are distributed). (4) Conclusion: Social network maps and summary values provide insight into acoustic changes within an environment visually and quantitatively.
(1)背景:生态学家使用声学记录进行长期环境监测。然而,由于录音是不透明的,从中获取有意义的信息是一项具有挑战性的任务。从录音中计算摘要索引是减小音频数据大小的一种方法,但摘要索引的信息量仍然太大。(2)方法:本研究探索了社会网络分析在声学变化可视化和定量模型中的应用。为了实现我们的目标,我们使用两种算法对摘要索引进行聚类,并使用结果生成网络地图。(3)结果与讨论:网络地图使我们能够在视觉上感知一天内的声学变化,并在视觉上比较一天与另一天。为了进行定量比较,我们还从社会网络地图中计算了汇总值,包括基尼系数(一种用来估计事件分布不均匀程度的经济概念)。(4)结论:社会网络地图和汇总值可以直观和定量地了解环境中的声学变化。
{"title":"Social Network Analysis of an Acoustic Environment: The Use of Visualised Data to Characterise Natural Habitats","authors":"Junling Wang, M. Sankupellay, D. Konovalov, M. Towsey, P. Roe","doi":"10.1109/DICTA47822.2019.8945963","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945963","url":null,"abstract":"(1) Background: Ecologists use acoustic recordings for long term environmental monitoring. However, as audio recordings are opaque, obtaining meaningful information from them is a challenging task. Calculating summary indices from recordings is one way to reduce the size of audio data, but the amount of information of summary indices is still too big. (2) Method: In this study we explore the application of social network analysis to visually and quantitatively model acoustic changes. To achieve our aim, we clustered summary indices using two algorithms, and the results were used to generate network maps. (3) Results and Discussion: The network maps allowed us to visually perceive acoustic changes in a day and to visually compare one day to another. To enable quantitative comparison, we also calculated summary values from the social network maps, including Gini coefficient (an economical concept adopted to estimate how unevenly the occurrences are distributed). (4) Conclusion: Social network maps and summary values provide insight into acoustic changes within an environment visually and quantitatively.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83109699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Runway Detection and Localization in Aerial Images using Deep Learning 基于深度学习的航拍图像跑道检测与定位
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945889
Javeria Akbar, M. Shahzad, M. I. Malik, A. Ul-Hasan, F. Shafait
Landing is the most difficult phase of the flight for any airborne platform. Due to lack of efficient systems, there have been numerous landing accidents resulting in the damage of onboard hardware. Vision based systems provides low cost solution to detect landing sites by providing rich textual information. To this end, this research focuses on accurate detection and localization of runways in aerial images with untidy terrains which would consequently help aerial platforms especially Unmanned Aerial Vehicles (commonly referred to as Drones) to detect landing targets (i.e., runways) to aid automatic landing. Most of the prior work regarding runway detection is based on simple image processing algorithms with lot of assumptions and constraints about precise position of runway in a particular image. First part of this research is to develop runway detection algorithm based on state-of-the-art deep learning architectures while the second part is runway localization using both deep learning and non-deep learning based methods. The proposed runway detection approach is two-stage modular where in the first stage the aerial image classification is achieved to find the existence of runway in that particular image. Later, in the second stage, the identified runways are localized using both conventional line detection algorithms and more recent deep learning models. The runway classification has been achieved with an accuracy of around 97% whereas the runways have been localized with mean Intersection-over-Union (IoU) score of 0.8.
着陆是任何空中平台飞行中最困难的阶段。由于缺乏有效的系统,已经有许多着陆事故导致机上硬件损坏。基于视觉的系统通过提供丰富的文本信息为探测着陆点提供了低成本的解决方案。为此,本研究的重点是在地形不整齐的航拍图像中对跑道进行准确的检测和定位,从而帮助空中平台,特别是无人机(通常简称无人机)探测着陆目标(即跑道),辅助自动着陆。之前关于跑道检测的大部分工作都是基于简单的图像处理算法,对特定图像中跑道的精确位置有大量的假设和约束。本研究的第一部分是基于最先进的深度学习架构开发跑道检测算法,第二部分是使用深度学习和非深度学习方法进行跑道定位。本文提出的跑道检测方法是两阶段模块化的,第一阶段对航拍图像进行分类,找出特定图像中是否存在跑道。随后,在第二阶段,使用传统的线路检测算法和最新的深度学习模型对已识别的跑道进行定位。跑道分类的准确率达到了97%左右,而跑道的定位平均IoU分数为0.8。
{"title":"Runway Detection and Localization in Aerial Images using Deep Learning","authors":"Javeria Akbar, M. Shahzad, M. I. Malik, A. Ul-Hasan, F. Shafait","doi":"10.1109/DICTA47822.2019.8945889","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945889","url":null,"abstract":"Landing is the most difficult phase of the flight for any airborne platform. Due to lack of efficient systems, there have been numerous landing accidents resulting in the damage of onboard hardware. Vision based systems provides low cost solution to detect landing sites by providing rich textual information. To this end, this research focuses on accurate detection and localization of runways in aerial images with untidy terrains which would consequently help aerial platforms especially Unmanned Aerial Vehicles (commonly referred to as Drones) to detect landing targets (i.e., runways) to aid automatic landing. Most of the prior work regarding runway detection is based on simple image processing algorithms with lot of assumptions and constraints about precise position of runway in a particular image. First part of this research is to develop runway detection algorithm based on state-of-the-art deep learning architectures while the second part is runway localization using both deep learning and non-deep learning based methods. The proposed runway detection approach is two-stage modular where in the first stage the aerial image classification is achieved to find the existence of runway in that particular image. Later, in the second stage, the identified runways are localized using both conventional line detection algorithms and more recent deep learning models. The runway classification has been achieved with an accuracy of around 97% whereas the runways have been localized with mean Intersection-over-Union (IoU) score of 0.8.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"90 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76054419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Enhanced Micro Target Detection through Local Motion Feedback in Biologically Inspired Algorithms 基于生物启发算法的局部运动反馈增强微目标检测
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945846
A. Melville-Smith, A. Finn, R. Brinkworth
Looking for micro targets (objects in the range of 1.2×1.2 pixels) that are moving in electro-optic imagery is a relatively simple task when the background is perfectly still. Once motion is introduced into the background, such as movement from trees and bushes or ego-motion induced by a moving platform, the task becomes much more difficult. Flies have a method of dealing with such motion while still being able to detect small moving targets. This paper takes an existing model based on the fly's early visual systems and compares it to existing methods of target detection. High dynamic range imagery is used and motion induced to reflect the effects of a rotating platform. The model of the fly's visual system is then enhanced to include local area motion feedback to help separate the moving background from moving targets in cluttered scenes. This feedback increases the performance of the system, showing a general improvement of over 80% from the baseline model, and 30 times better performance than the pixel-based adaptive segmenter and local contrast methods. These results indicate the enhanced model is able to perform micro target detection with better discrimination between targets and the background in cluttered scenes from a moving platform.
在背景完全静止的情况下,寻找电光图像中移动的微目标(1.2×1.2像素范围内的物体)是一项相对简单的任务。一旦在背景中加入运动,例如来自树木和灌木丛的运动或移动平台引起的自我运动,任务就会变得更加困难。苍蝇有一种处理这种运动的方法,同时仍然能够探测到小的移动目标。本文采用基于果蝇早期视觉系统的现有模型,并将其与现有的目标检测方法进行比较。采用高动态范围图像和运动诱导来反映旋转平台的影响。然后,苍蝇的视觉系统模型被增强,包括局部运动反馈,以帮助在混乱的场景中区分运动背景和运动目标。这种反馈提高了系统的性能,显示出比基线模型总体上提高了80%以上,比基于像素的自适应分割和局部对比方法的性能好30倍。实验结果表明,该模型能够在移动平台的杂乱场景中进行微目标检测,并能较好地区分目标和背景。
{"title":"Enhanced Micro Target Detection through Local Motion Feedback in Biologically Inspired Algorithms","authors":"A. Melville-Smith, A. Finn, R. Brinkworth","doi":"10.1109/DICTA47822.2019.8945846","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945846","url":null,"abstract":"Looking for micro targets (objects in the range of 1.2×1.2 pixels) that are moving in electro-optic imagery is a relatively simple task when the background is perfectly still. Once motion is introduced into the background, such as movement from trees and bushes or ego-motion induced by a moving platform, the task becomes much more difficult. Flies have a method of dealing with such motion while still being able to detect small moving targets. This paper takes an existing model based on the fly's early visual systems and compares it to existing methods of target detection. High dynamic range imagery is used and motion induced to reflect the effects of a rotating platform. The model of the fly's visual system is then enhanced to include local area motion feedback to help separate the moving background from moving targets in cluttered scenes. This feedback increases the performance of the system, showing a general improvement of over 80% from the baseline model, and 30 times better performance than the pixel-based adaptive segmenter and local contrast methods. These results indicate the enhanced model is able to perform micro target detection with better discrimination between targets and the background in cluttered scenes from a moving platform.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"30 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73064802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Detection of Central Retinal Vein Occlusion using Guided Salient Features 利用引导显著特征检测视网膜中央静脉阻塞
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945864
N. Rajapaksha, L. Ranathunga, K. Bandara
Central Retinal Vein Occlusion (CRVO) is one of the most common retinal vascular disorders in the world. Since it causes sudden permanent vision loss, it is crucial to detect and treat CRVO immediately to avoid further vision deterioration. But, manually diagnosing CRVO is a time-consuming task which requires constant attention of ophthalmologist. Although there are considerable number of research have been conducted in detecting Branch Retinal Occlusion, there are only few approaches have been proposed to identify CRVO automatically using fundus images. Multiple other approaches have been proposed to detect symptoms of CRVO such as hemorrhages, macular oedema separately. This paper has proposed guided salient feature-based approach to detect CRVO automatically. Overall system has achieved 89.04% accuracy with 72.5% precision and 70.73% recall. Furthermore, it has introduced novel approaches to detect retinal hemorrhages and optic disc. As it is a guided framework based on expert knowledge, this study eliminates the haziness presents in probabilistic feature selection approaches.
视网膜中央静脉阻塞(CRVO)是世界上最常见的视网膜血管疾病之一。由于CRVO会导致突然的永久性视力丧失,因此及时发现和治疗CRVO是避免视力进一步恶化的关键。但是,人工诊断CRVO是一项耗时的任务,需要眼科医生的持续关注。虽然在检测视网膜分支闭塞方面已经进行了相当多的研究,但利用眼底图像自动识别视网膜分支闭塞的方法很少。已经提出了多种其他方法来分别检测CRVO的症状,如出血、黄斑水肿。本文提出了一种基于引导显著特征的CRVO自动检测方法。整个系统的准确率为89.04%,精密度为72.5%,召回率为70.73%。此外,它还介绍了检测视网膜出血和视盘的新方法。由于它是一个基于专家知识的指导框架,该研究消除了概率特征选择方法的模糊性。
{"title":"Detection of Central Retinal Vein Occlusion using Guided Salient Features","authors":"N. Rajapaksha, L. Ranathunga, K. Bandara","doi":"10.1109/DICTA47822.2019.8945864","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945864","url":null,"abstract":"Central Retinal Vein Occlusion (CRVO) is one of the most common retinal vascular disorders in the world. Since it causes sudden permanent vision loss, it is crucial to detect and treat CRVO immediately to avoid further vision deterioration. But, manually diagnosing CRVO is a time-consuming task which requires constant attention of ophthalmologist. Although there are considerable number of research have been conducted in detecting Branch Retinal Occlusion, there are only few approaches have been proposed to identify CRVO automatically using fundus images. Multiple other approaches have been proposed to detect symptoms of CRVO such as hemorrhages, macular oedema separately. This paper has proposed guided salient feature-based approach to detect CRVO automatically. Overall system has achieved 89.04% accuracy with 72.5% precision and 70.73% recall. Furthermore, it has introduced novel approaches to detect retinal hemorrhages and optic disc. As it is a guided framework based on expert knowledge, this study eliminates the haziness presents in probabilistic feature selection approaches.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"124 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75806261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Historical Document Text Binarization using Atrous Convolution and Multi-Scale Feature Decoder 基于自然卷积和多尺度特征解码器的历史文献文本二值化
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946108
Hanif Rasyidi, S. Khan
This paper presents a segmentation-based binarization model to extract text information from the historical document using convolutional neural networks. The proposed method uses atrous convolution feature extraction to learn useful text pattern from the document without making a significant reduction on the spatial size of the image. The model then combines the extracted feature using a multi-scale decoder to construct a binary image that contains only text information from the document. We train our model using a series of DIBCO competition datasets and compare the results with the existing text binarization methods as well as a state-of-the-art object segmentation model. The experiment results on the H-DIBCO 2016 dataset show that our method has an excellent performance on the pseudo F-Score metric that surpasses the result of various existing methods.
本文提出了一种基于分割的二值化模型,利用卷积神经网络从历史文档中提取文本信息。该方法在不显著减小图像空间大小的情况下,利用亚历克斯卷积特征提取从文档中学习有用的文本模式。然后,该模型使用多尺度解码器组合提取的特征,以构建仅包含文档文本信息的二值图像。我们使用一系列DIBCO竞争数据集训练我们的模型,并将结果与现有的文本二值化方法以及最先进的目标分割模型进行比较。在H-DIBCO 2016数据集上的实验结果表明,我们的方法在伪F-Score指标上的性能优于现有的各种方法。
{"title":"Historical Document Text Binarization using Atrous Convolution and Multi-Scale Feature Decoder","authors":"Hanif Rasyidi, S. Khan","doi":"10.1109/DICTA47822.2019.8946108","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946108","url":null,"abstract":"This paper presents a segmentation-based binarization model to extract text information from the historical document using convolutional neural networks. The proposed method uses atrous convolution feature extraction to learn useful text pattern from the document without making a significant reduction on the spatial size of the image. The model then combines the extracted feature using a multi-scale decoder to construct a binary image that contains only text information from the document. We train our model using a series of DIBCO competition datasets and compare the results with the existing text binarization methods as well as a state-of-the-art object segmentation model. The experiment results on the H-DIBCO 2016 dataset show that our method has an excellent performance on the pseudo F-Score metric that surpasses the result of various existing methods.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"163 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78606315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Style-Transfer to Understand Material Classification for Robotic Sorting of Recycled Beverage Containers 用风格迁移理解回收饮料容器机器人分拣中的物料分类
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945993
M. McDonnell, Bahar Moezzi, R. Brinkworth
Robotic sorting machines are increasingly being investigated for use in recycling centers. We consider the problem of automatically classifying images of recycled beverage containers by material type, i.e. glass, plastic, metal or liquid-packaging-board, when the containers are not in their original condition, meaning their shape and size may be deformed, and coloring and packaging labels may be damaged or dirty. We describe a retrofitted computer vision system and deep convolutional neural network classifier designed for this purpose, that enabled a sorting machine's accuracy and speed to reach commercially viable benchmarks. We investigate what was more important for highly accurate container material recognition: shape, size, color, texture or all of these? To help answer this question, we made use of style-transfer methods from the field of deep learning. We found that removing either texture or shape cues significantly reduced the accuracy in container material classification, while removing color had a minor negative effect. Unlike recent work on generic objects in ImageNet, networks trained to classify by container material type learned better from object shape than texture. Our findings show that commercial sorting of recycled beverage containers by material type at high accuracy is feasible, even when the containers are in poor condition. Furthermore, we reinforce the recent finding that convolutional neural networks can learn predominantly either from texture cues or shape.
越来越多的人正在研究在回收中心使用机器人分拣机。我们考虑的问题是,当容器不在其原始状态时,即容器的形状和尺寸可能会变形,颜色和包装标签可能会损坏或脏时,根据材料类型(即玻璃,塑料,金属或液体包装板)对回收饮料容器的图像进行自动分类。我们描述了为此目的而设计的改进的计算机视觉系统和深度卷积神经网络分类器,使分选机的准确性和速度达到商业上可行的基准。我们研究了对于高度精确的容器材料识别来说,哪个更重要:形状、大小、颜色、纹理还是所有这些?为了帮助回答这个问题,我们使用了深度学习领域的风格迁移方法。我们发现,去除纹理或形状线索显著降低了容器材料分类的准确性,而去除颜色则有轻微的负面影响。与ImageNet中最近对通用对象的研究不同,通过容器材料类型进行分类的网络从物体形状而不是纹理中学习得更好。我们的研究结果表明,即使在容器状况不佳的情况下,按材料类型对回收饮料容器进行高精度的商业分类也是可行的。此外,我们强化了最近的发现,即卷积神经网络可以主要从纹理线索或形状中学习。
{"title":"Using Style-Transfer to Understand Material Classification for Robotic Sorting of Recycled Beverage Containers","authors":"M. McDonnell, Bahar Moezzi, R. Brinkworth","doi":"10.1109/DICTA47822.2019.8945993","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945993","url":null,"abstract":"Robotic sorting machines are increasingly being investigated for use in recycling centers. We consider the problem of automatically classifying images of recycled beverage containers by material type, i.e. glass, plastic, metal or liquid-packaging-board, when the containers are not in their original condition, meaning their shape and size may be deformed, and coloring and packaging labels may be damaged or dirty. We describe a retrofitted computer vision system and deep convolutional neural network classifier designed for this purpose, that enabled a sorting machine's accuracy and speed to reach commercially viable benchmarks. We investigate what was more important for highly accurate container material recognition: shape, size, color, texture or all of these? To help answer this question, we made use of style-transfer methods from the field of deep learning. We found that removing either texture or shape cues significantly reduced the accuracy in container material classification, while removing color had a minor negative effect. Unlike recent work on generic objects in ImageNet, networks trained to classify by container material type learned better from object shape than texture. Our findings show that commercial sorting of recycled beverage containers by material type at high accuracy is feasible, even when the containers are in poor condition. Furthermore, we reinforce the recent finding that convolutional neural networks can learn predominantly either from texture cues or shape.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"18 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81724979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Facial-Expression Recognition from Video using Enhanced Convolutional LSTM 基于增强卷积LSTM的视频面部表情识别
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946025
Ryo Miyoshi, N. Nagata, M. Hashimoto
We propose an enhanced convolutional long short-term memory (ConvLSTM) algorithm, i.e., Enhanced ConvLSTM, by adding skip connections in the spatial and temporal directions to conventional ConvLSTM to suppress gradient vanishing and use older information. We also propose a method that uses this algorithm to automatically recognize facial expressions from videos. The proposed facial-expression recognition method consists of two Enhanced ConvLSTM streams and two ResNet streams. The Enhanced ConvLSTM streams extract features for fine movements, and the ResNet streams extract features for rough movements. In the Enhanced ConvLSTM streams, spatio-temporal features are extracted by stacking the Enhanced ConvLSTM. We conducted experiments to compare a method using ConvLSTM with skip connections (proposed Enhanced ConvLSTM) and a method without them (conventional ConvLSTM). A method using Enhanced CovnLSTM had a 4.44% higher accuracy than the a method using conventional ConvL-STM. Also the proposed facial-expression recognition method achieved 45.29% accuracy, which is 2.31% higher than that of the conventional method.
我们提出了一种增强的卷积长短期记忆(ConvLSTM)算法,即增强的ConvLSTM,通过在传统的ConvLSTM上增加空间和时间方向上的跳跃连接来抑制梯度消失并使用旧信息。我们还提出了一种利用该算法从视频中自动识别面部表情的方法。所提出的面部表情识别方法由两个增强的ConvLSTM流和两个ResNet流组成。增强的ConvLSTM流提取精细运动的特征,而ResNet流提取粗糙运动的特征。在增强的ConvLSTM流中,通过叠加增强的ConvLSTM提取时空特征。我们进行了实验,比较了使用带有跳过连接的ConvLSTM方法(提出的增强型ConvLSTM)和没有跳过连接的方法(传统的ConvLSTM)。与传统的ConvL-STM方法相比,采用Enhanced CovnLSTM方法的准确率提高了4.44%。人脸识别准确率达到45.29%,比传统方法提高2.31%。
{"title":"Facial-Expression Recognition from Video using Enhanced Convolutional LSTM","authors":"Ryo Miyoshi, N. Nagata, M. Hashimoto","doi":"10.1109/DICTA47822.2019.8946025","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946025","url":null,"abstract":"We propose an enhanced convolutional long short-term memory (ConvLSTM) algorithm, i.e., Enhanced ConvLSTM, by adding skip connections in the spatial and temporal directions to conventional ConvLSTM to suppress gradient vanishing and use older information. We also propose a method that uses this algorithm to automatically recognize facial expressions from videos. The proposed facial-expression recognition method consists of two Enhanced ConvLSTM streams and two ResNet streams. The Enhanced ConvLSTM streams extract features for fine movements, and the ResNet streams extract features for rough movements. In the Enhanced ConvLSTM streams, spatio-temporal features are extracted by stacking the Enhanced ConvLSTM. We conducted experiments to compare a method using ConvLSTM with skip connections (proposed Enhanced ConvLSTM) and a method without them (conventional ConvLSTM). A method using Enhanced CovnLSTM had a 4.44% higher accuracy than the a method using conventional ConvL-STM. Also the proposed facial-expression recognition method achieved 45.29% accuracy, which is 2.31% higher than that of the conventional method.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86725191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Flood Detection in Social Media Images using Visual Features and Metadata 基于视觉特征和元数据的社交媒体图像洪水检测
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946007
R. Jony, A. Woodley, Dimitri Perrin
Images uploaded to social media platforms such as Twitter and Flickr have become a potential source of information about natural disasters. However, due to their lower reliability and noisy nature, it is challenging to automatically identify social media images that genuinely contain evidence of natural disasters. Visual features have been popular for classifying these images while the associated metadata are often ignored or exploited only to a limited extent. To test their potential, we employed them separately to identify social media images with flooding evidence. For visual feature extraction, we utilized three advanced Convolutional Neural Networks (CNNs) pre-trained on two different types of datasets and used a simple neural network for the classification. The results demonstrate that the combination of two types of visual features have a positive impact on distinguishing natural disaster images. From metadata, we considered only the textual metadata. Here, we combined all textual metadata and extracted bi-gram features. Then we employed a Support Vector Machine (SVM) for the classification task. The results show that the combination of the textual metadata can improve the classification accuracy compared to their individual counterparts. The results also demonstrate that although the visual feature approach outperforms metadata approach, metadata have certain capability to classify these images. For instance, the proposed visual feature approach achieved a similar result (MAP = 95.15) compared to the top visual feature approaches presented in MediaEval 2017, the metadata approach outperformed (MAP = 84.52) presented metadata methods. For the experiments, we utilized dataset from MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM) task and compared the achieved results with the other methods presented (Number of participants = 11) of the task.
上传到Twitter和Flickr等社交媒体平台上的图片已经成为有关自然灾害的潜在信息来源。然而,由于其可靠性较低和嘈杂的性质,自动识别真正包含自然灾害证据的社交媒体图像具有挑战性。视觉特征被广泛用于对这些图像进行分类,而相关的元数据往往被忽略或只在有限的程度上被利用。为了测试它们的潜力,我们分别使用它们来识别具有大量证据的社交媒体图像。对于视觉特征提取,我们使用了三个先进的卷积神经网络(cnn)在两种不同类型的数据集上进行预训练,并使用一个简单的神经网络进行分类。结果表明,两类视觉特征的结合对自然灾害图像的识别有积极的影响。从元数据来看,我们只考虑了文本元数据。在这里,我们结合了所有文本元数据并提取了双元图特征。然后我们使用支持向量机(SVM)进行分类任务。结果表明,文本元数据的组合比单独的文本元数据更能提高分类精度。结果还表明,虽然视觉特征方法优于元数据方法,但元数据对这些图像具有一定的分类能力。例如,与MediaEval 2017中提出的顶级视觉特征方法相比,所提出的视觉特征方法取得了相似的结果(MAP = 95.15),而元数据方法的表现优于所提出的元数据方法(MAP = 84.52)。实验中,我们使用MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM)任务的数据集,并将所获得的结果与该任务的其他方法(参与者数= 11)进行比较。
{"title":"Flood Detection in Social Media Images using Visual Features and Metadata","authors":"R. Jony, A. Woodley, Dimitri Perrin","doi":"10.1109/DICTA47822.2019.8946007","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946007","url":null,"abstract":"Images uploaded to social media platforms such as Twitter and Flickr have become a potential source of information about natural disasters. However, due to their lower reliability and noisy nature, it is challenging to automatically identify social media images that genuinely contain evidence of natural disasters. Visual features have been popular for classifying these images while the associated metadata are often ignored or exploited only to a limited extent. To test their potential, we employed them separately to identify social media images with flooding evidence. For visual feature extraction, we utilized three advanced Convolutional Neural Networks (CNNs) pre-trained on two different types of datasets and used a simple neural network for the classification. The results demonstrate that the combination of two types of visual features have a positive impact on distinguishing natural disaster images. From metadata, we considered only the textual metadata. Here, we combined all textual metadata and extracted bi-gram features. Then we employed a Support Vector Machine (SVM) for the classification task. The results show that the combination of the textual metadata can improve the classification accuracy compared to their individual counterparts. The results also demonstrate that although the visual feature approach outperforms metadata approach, metadata have certain capability to classify these images. For instance, the proposed visual feature approach achieved a similar result (MAP = 95.15) compared to the top visual feature approaches presented in MediaEval 2017, the metadata approach outperformed (MAP = 84.52) presented metadata methods. For the experiments, we utilized dataset from MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM) task and compared the achieved results with the other methods presented (Number of participants = 11) of the task.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"72 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84017791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Assessment and Elimination of Inflammatory Cell: A Machine Learning Approach in Digital Cytology 评估和消除炎症细胞:数字细胞学中的机器学习方法
Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946065
Jing Ke, Junwei Deng, Yizhou Lu, Dadong Wang, Yang Song, Huijuan Zhang
In automatic cytology image diagnosis, the false-positive or false-negative often come up with inflammatory cells that obscure the identification of abnormal or normal cells. These phenotypes are presented in the similar appearance in shape, color and texture with cells to detect. In this paper, to evaluate the inflammation and eliminate their disturbances of recognizing cells of interests, we propose a two-stage framework containing a deep learning based neural network to detect and estimate the proportions of inflammatory cells, and a morphology based image processing architecture to eliminate them from the digital images with image inpainting. For performance evaluation, we apply the framework to our collected real-life clinical cytology slides presented with a variety of complexities. We evaluate the tests on sub-images cropped from 49 positive and 49 negative slides from different patients, each at the magnification rate of 40×. The experiments shows an accurate profile of the coverage of inflammation in the whole slide images, as well as their proportion in all the cells presented in the image. Confirmed by cytotechnologists, more than 96.0% of inflammatory cells are successfully detected at pixel level and well-inpainted in the cytology images without bringing new recognition problem.
在细胞学图像自动诊断中,假阳性或假阴性常伴有炎性细胞,使异常或正常细胞的鉴别模糊不清。这些表型在形状、颜色和纹理上呈现出与细胞相似的外观。在本文中,为了评估炎症并消除它们对识别感兴趣的细胞的干扰,我们提出了一个两阶段框架,其中包含一个基于深度学习的神经网络来检测和估计炎症细胞的比例,以及一个基于形态学的图像处理架构来从带有图像喷漆的数字图像中消除炎症细胞。为了进行性能评估,我们将该框架应用于我们收集的具有各种复杂性的真实临床细胞学幻灯片。我们对来自不同患者的49张阳性和49张阴性载玻片的亚图像进行了测试,每张载玻片的放大倍率为40倍。实验显示了整个幻灯片图像中炎症覆盖的准确概况,以及它们在图像中呈现的所有细胞中的比例。细胞技术专家证实,超过96.0%的炎症细胞在像素水平上被成功检测到,并且在细胞学图像中被很好地绘制,没有带来新的识别问题。
{"title":"Assessment and Elimination of Inflammatory Cell: A Machine Learning Approach in Digital Cytology","authors":"Jing Ke, Junwei Deng, Yizhou Lu, Dadong Wang, Yang Song, Huijuan Zhang","doi":"10.1109/DICTA47822.2019.8946065","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946065","url":null,"abstract":"In automatic cytology image diagnosis, the false-positive or false-negative often come up with inflammatory cells that obscure the identification of abnormal or normal cells. These phenotypes are presented in the similar appearance in shape, color and texture with cells to detect. In this paper, to evaluate the inflammation and eliminate their disturbances of recognizing cells of interests, we propose a two-stage framework containing a deep learning based neural network to detect and estimate the proportions of inflammatory cells, and a morphology based image processing architecture to eliminate them from the digital images with image inpainting. For performance evaluation, we apply the framework to our collected real-life clinical cytology slides presented with a variety of complexities. We evaluate the tests on sub-images cropped from 49 positive and 49 negative slides from different patients, each at the magnification rate of 40×. The experiments shows an accurate profile of the coverage of inflammation in the whole slide images, as well as their proportion in all the cells presented in the image. Confirmed by cytotechnologists, more than 96.0% of inflammatory cells are successfully detected at pixel level and well-inpainted in the cytology images without bringing new recognition problem.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"20 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84075531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1