Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945819
Sonit Singh, Sarvnaz Karimi, K. Ho-Shon, Len Hamey
Interpreting medical images and summarising them in the form of radiology reports is a challenging, tedious, and complex task. A radiologist provides a complete description of a medical image in the form of radiology report by describing normal or abnormal findings and providing a summary for decision making. Research shows that the radiology practice is error-prone due to the limited number of experts, increasing patient volumes, and the subjective nature of human perception. To reduce the number of diagnostic errors and to alleviate the task of radiologists, there is a need for a computer-aided report generation system that can automatically generate a radiology report for a given medical image. We propose an encoder-decoder based framework that can automatically generate radiology reports from medical images. Specifically, we use a Convolutional Neural Network as an encoder coupled with a multi-stage Stacked Long Short-Term Memory as a decoder to generate reports. We perform experiments on the Indiana University Chest X-ray collection, a publicly available dataset, to measure the effectiveness of our model. Experimental results show the effectiveness of our model in automatically generating radiology reports from medical images.
{"title":"From Chest X-Rays to Radiology Reports: A Multimodal Machine Learning Approach","authors":"Sonit Singh, Sarvnaz Karimi, K. Ho-Shon, Len Hamey","doi":"10.1109/DICTA47822.2019.8945819","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945819","url":null,"abstract":"Interpreting medical images and summarising them in the form of radiology reports is a challenging, tedious, and complex task. A radiologist provides a complete description of a medical image in the form of radiology report by describing normal or abnormal findings and providing a summary for decision making. Research shows that the radiology practice is error-prone due to the limited number of experts, increasing patient volumes, and the subjective nature of human perception. To reduce the number of diagnostic errors and to alleviate the task of radiologists, there is a need for a computer-aided report generation system that can automatically generate a radiology report for a given medical image. We propose an encoder-decoder based framework that can automatically generate radiology reports from medical images. Specifically, we use a Convolutional Neural Network as an encoder coupled with a multi-stage Stacked Long Short-Term Memory as a decoder to generate reports. We perform experiments on the Indiana University Chest X-ray collection, a publicly available dataset, to measure the effectiveness of our model. Experimental results show the effectiveness of our model in automatically generating radiology reports from medical images.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84169788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945963
Junling Wang, M. Sankupellay, D. Konovalov, M. Towsey, P. Roe
(1) Background: Ecologists use acoustic recordings for long term environmental monitoring. However, as audio recordings are opaque, obtaining meaningful information from them is a challenging task. Calculating summary indices from recordings is one way to reduce the size of audio data, but the amount of information of summary indices is still too big. (2) Method: In this study we explore the application of social network analysis to visually and quantitatively model acoustic changes. To achieve our aim, we clustered summary indices using two algorithms, and the results were used to generate network maps. (3) Results and Discussion: The network maps allowed us to visually perceive acoustic changes in a day and to visually compare one day to another. To enable quantitative comparison, we also calculated summary values from the social network maps, including Gini coefficient (an economical concept adopted to estimate how unevenly the occurrences are distributed). (4) Conclusion: Social network maps and summary values provide insight into acoustic changes within an environment visually and quantitatively.
{"title":"Social Network Analysis of an Acoustic Environment: The Use of Visualised Data to Characterise Natural Habitats","authors":"Junling Wang, M. Sankupellay, D. Konovalov, M. Towsey, P. Roe","doi":"10.1109/DICTA47822.2019.8945963","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945963","url":null,"abstract":"(1) Background: Ecologists use acoustic recordings for long term environmental monitoring. However, as audio recordings are opaque, obtaining meaningful information from them is a challenging task. Calculating summary indices from recordings is one way to reduce the size of audio data, but the amount of information of summary indices is still too big. (2) Method: In this study we explore the application of social network analysis to visually and quantitatively model acoustic changes. To achieve our aim, we clustered summary indices using two algorithms, and the results were used to generate network maps. (3) Results and Discussion: The network maps allowed us to visually perceive acoustic changes in a day and to visually compare one day to another. To enable quantitative comparison, we also calculated summary values from the social network maps, including Gini coefficient (an economical concept adopted to estimate how unevenly the occurrences are distributed). (4) Conclusion: Social network maps and summary values provide insight into acoustic changes within an environment visually and quantitatively.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83109699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945889
Javeria Akbar, M. Shahzad, M. I. Malik, A. Ul-Hasan, F. Shafait
Landing is the most difficult phase of the flight for any airborne platform. Due to lack of efficient systems, there have been numerous landing accidents resulting in the damage of onboard hardware. Vision based systems provides low cost solution to detect landing sites by providing rich textual information. To this end, this research focuses on accurate detection and localization of runways in aerial images with untidy terrains which would consequently help aerial platforms especially Unmanned Aerial Vehicles (commonly referred to as Drones) to detect landing targets (i.e., runways) to aid automatic landing. Most of the prior work regarding runway detection is based on simple image processing algorithms with lot of assumptions and constraints about precise position of runway in a particular image. First part of this research is to develop runway detection algorithm based on state-of-the-art deep learning architectures while the second part is runway localization using both deep learning and non-deep learning based methods. The proposed runway detection approach is two-stage modular where in the first stage the aerial image classification is achieved to find the existence of runway in that particular image. Later, in the second stage, the identified runways are localized using both conventional line detection algorithms and more recent deep learning models. The runway classification has been achieved with an accuracy of around 97% whereas the runways have been localized with mean Intersection-over-Union (IoU) score of 0.8.
{"title":"Runway Detection and Localization in Aerial Images using Deep Learning","authors":"Javeria Akbar, M. Shahzad, M. I. Malik, A. Ul-Hasan, F. Shafait","doi":"10.1109/DICTA47822.2019.8945889","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945889","url":null,"abstract":"Landing is the most difficult phase of the flight for any airborne platform. Due to lack of efficient systems, there have been numerous landing accidents resulting in the damage of onboard hardware. Vision based systems provides low cost solution to detect landing sites by providing rich textual information. To this end, this research focuses on accurate detection and localization of runways in aerial images with untidy terrains which would consequently help aerial platforms especially Unmanned Aerial Vehicles (commonly referred to as Drones) to detect landing targets (i.e., runways) to aid automatic landing. Most of the prior work regarding runway detection is based on simple image processing algorithms with lot of assumptions and constraints about precise position of runway in a particular image. First part of this research is to develop runway detection algorithm based on state-of-the-art deep learning architectures while the second part is runway localization using both deep learning and non-deep learning based methods. The proposed runway detection approach is two-stage modular where in the first stage the aerial image classification is achieved to find the existence of runway in that particular image. Later, in the second stage, the identified runways are localized using both conventional line detection algorithms and more recent deep learning models. The runway classification has been achieved with an accuracy of around 97% whereas the runways have been localized with mean Intersection-over-Union (IoU) score of 0.8.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"90 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76054419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945846
A. Melville-Smith, A. Finn, R. Brinkworth
Looking for micro targets (objects in the range of 1.2×1.2 pixels) that are moving in electro-optic imagery is a relatively simple task when the background is perfectly still. Once motion is introduced into the background, such as movement from trees and bushes or ego-motion induced by a moving platform, the task becomes much more difficult. Flies have a method of dealing with such motion while still being able to detect small moving targets. This paper takes an existing model based on the fly's early visual systems and compares it to existing methods of target detection. High dynamic range imagery is used and motion induced to reflect the effects of a rotating platform. The model of the fly's visual system is then enhanced to include local area motion feedback to help separate the moving background from moving targets in cluttered scenes. This feedback increases the performance of the system, showing a general improvement of over 80% from the baseline model, and 30 times better performance than the pixel-based adaptive segmenter and local contrast methods. These results indicate the enhanced model is able to perform micro target detection with better discrimination between targets and the background in cluttered scenes from a moving platform.
{"title":"Enhanced Micro Target Detection through Local Motion Feedback in Biologically Inspired Algorithms","authors":"A. Melville-Smith, A. Finn, R. Brinkworth","doi":"10.1109/DICTA47822.2019.8945846","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945846","url":null,"abstract":"Looking for micro targets (objects in the range of 1.2×1.2 pixels) that are moving in electro-optic imagery is a relatively simple task when the background is perfectly still. Once motion is introduced into the background, such as movement from trees and bushes or ego-motion induced by a moving platform, the task becomes much more difficult. Flies have a method of dealing with such motion while still being able to detect small moving targets. This paper takes an existing model based on the fly's early visual systems and compares it to existing methods of target detection. High dynamic range imagery is used and motion induced to reflect the effects of a rotating platform. The model of the fly's visual system is then enhanced to include local area motion feedback to help separate the moving background from moving targets in cluttered scenes. This feedback increases the performance of the system, showing a general improvement of over 80% from the baseline model, and 30 times better performance than the pixel-based adaptive segmenter and local contrast methods. These results indicate the enhanced model is able to perform micro target detection with better discrimination between targets and the background in cluttered scenes from a moving platform.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"30 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73064802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945864
N. Rajapaksha, L. Ranathunga, K. Bandara
Central Retinal Vein Occlusion (CRVO) is one of the most common retinal vascular disorders in the world. Since it causes sudden permanent vision loss, it is crucial to detect and treat CRVO immediately to avoid further vision deterioration. But, manually diagnosing CRVO is a time-consuming task which requires constant attention of ophthalmologist. Although there are considerable number of research have been conducted in detecting Branch Retinal Occlusion, there are only few approaches have been proposed to identify CRVO automatically using fundus images. Multiple other approaches have been proposed to detect symptoms of CRVO such as hemorrhages, macular oedema separately. This paper has proposed guided salient feature-based approach to detect CRVO automatically. Overall system has achieved 89.04% accuracy with 72.5% precision and 70.73% recall. Furthermore, it has introduced novel approaches to detect retinal hemorrhages and optic disc. As it is a guided framework based on expert knowledge, this study eliminates the haziness presents in probabilistic feature selection approaches.
{"title":"Detection of Central Retinal Vein Occlusion using Guided Salient Features","authors":"N. Rajapaksha, L. Ranathunga, K. Bandara","doi":"10.1109/DICTA47822.2019.8945864","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945864","url":null,"abstract":"Central Retinal Vein Occlusion (CRVO) is one of the most common retinal vascular disorders in the world. Since it causes sudden permanent vision loss, it is crucial to detect and treat CRVO immediately to avoid further vision deterioration. But, manually diagnosing CRVO is a time-consuming task which requires constant attention of ophthalmologist. Although there are considerable number of research have been conducted in detecting Branch Retinal Occlusion, there are only few approaches have been proposed to identify CRVO automatically using fundus images. Multiple other approaches have been proposed to detect symptoms of CRVO such as hemorrhages, macular oedema separately. This paper has proposed guided salient feature-based approach to detect CRVO automatically. Overall system has achieved 89.04% accuracy with 72.5% precision and 70.73% recall. Furthermore, it has introduced novel approaches to detect retinal hemorrhages and optic disc. As it is a guided framework based on expert knowledge, this study eliminates the haziness presents in probabilistic feature selection approaches.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"124 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75806261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946108
Hanif Rasyidi, S. Khan
This paper presents a segmentation-based binarization model to extract text information from the historical document using convolutional neural networks. The proposed method uses atrous convolution feature extraction to learn useful text pattern from the document without making a significant reduction on the spatial size of the image. The model then combines the extracted feature using a multi-scale decoder to construct a binary image that contains only text information from the document. We train our model using a series of DIBCO competition datasets and compare the results with the existing text binarization methods as well as a state-of-the-art object segmentation model. The experiment results on the H-DIBCO 2016 dataset show that our method has an excellent performance on the pseudo F-Score metric that surpasses the result of various existing methods.
{"title":"Historical Document Text Binarization using Atrous Convolution and Multi-Scale Feature Decoder","authors":"Hanif Rasyidi, S. Khan","doi":"10.1109/DICTA47822.2019.8946108","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946108","url":null,"abstract":"This paper presents a segmentation-based binarization model to extract text information from the historical document using convolutional neural networks. The proposed method uses atrous convolution feature extraction to learn useful text pattern from the document without making a significant reduction on the spatial size of the image. The model then combines the extracted feature using a multi-scale decoder to construct a binary image that contains only text information from the document. We train our model using a series of DIBCO competition datasets and compare the results with the existing text binarization methods as well as a state-of-the-art object segmentation model. The experiment results on the H-DIBCO 2016 dataset show that our method has an excellent performance on the pseudo F-Score metric that surpasses the result of various existing methods.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"163 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78606315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945993
M. McDonnell, Bahar Moezzi, R. Brinkworth
Robotic sorting machines are increasingly being investigated for use in recycling centers. We consider the problem of automatically classifying images of recycled beverage containers by material type, i.e. glass, plastic, metal or liquid-packaging-board, when the containers are not in their original condition, meaning their shape and size may be deformed, and coloring and packaging labels may be damaged or dirty. We describe a retrofitted computer vision system and deep convolutional neural network classifier designed for this purpose, that enabled a sorting machine's accuracy and speed to reach commercially viable benchmarks. We investigate what was more important for highly accurate container material recognition: shape, size, color, texture or all of these? To help answer this question, we made use of style-transfer methods from the field of deep learning. We found that removing either texture or shape cues significantly reduced the accuracy in container material classification, while removing color had a minor negative effect. Unlike recent work on generic objects in ImageNet, networks trained to classify by container material type learned better from object shape than texture. Our findings show that commercial sorting of recycled beverage containers by material type at high accuracy is feasible, even when the containers are in poor condition. Furthermore, we reinforce the recent finding that convolutional neural networks can learn predominantly either from texture cues or shape.
{"title":"Using Style-Transfer to Understand Material Classification for Robotic Sorting of Recycled Beverage Containers","authors":"M. McDonnell, Bahar Moezzi, R. Brinkworth","doi":"10.1109/DICTA47822.2019.8945993","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945993","url":null,"abstract":"Robotic sorting machines are increasingly being investigated for use in recycling centers. We consider the problem of automatically classifying images of recycled beverage containers by material type, i.e. glass, plastic, metal or liquid-packaging-board, when the containers are not in their original condition, meaning their shape and size may be deformed, and coloring and packaging labels may be damaged or dirty. We describe a retrofitted computer vision system and deep convolutional neural network classifier designed for this purpose, that enabled a sorting machine's accuracy and speed to reach commercially viable benchmarks. We investigate what was more important for highly accurate container material recognition: shape, size, color, texture or all of these? To help answer this question, we made use of style-transfer methods from the field of deep learning. We found that removing either texture or shape cues significantly reduced the accuracy in container material classification, while removing color had a minor negative effect. Unlike recent work on generic objects in ImageNet, networks trained to classify by container material type learned better from object shape than texture. Our findings show that commercial sorting of recycled beverage containers by material type at high accuracy is feasible, even when the containers are in poor condition. Furthermore, we reinforce the recent finding that convolutional neural networks can learn predominantly either from texture cues or shape.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"18 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81724979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946025
Ryo Miyoshi, N. Nagata, M. Hashimoto
We propose an enhanced convolutional long short-term memory (ConvLSTM) algorithm, i.e., Enhanced ConvLSTM, by adding skip connections in the spatial and temporal directions to conventional ConvLSTM to suppress gradient vanishing and use older information. We also propose a method that uses this algorithm to automatically recognize facial expressions from videos. The proposed facial-expression recognition method consists of two Enhanced ConvLSTM streams and two ResNet streams. The Enhanced ConvLSTM streams extract features for fine movements, and the ResNet streams extract features for rough movements. In the Enhanced ConvLSTM streams, spatio-temporal features are extracted by stacking the Enhanced ConvLSTM. We conducted experiments to compare a method using ConvLSTM with skip connections (proposed Enhanced ConvLSTM) and a method without them (conventional ConvLSTM). A method using Enhanced CovnLSTM had a 4.44% higher accuracy than the a method using conventional ConvL-STM. Also the proposed facial-expression recognition method achieved 45.29% accuracy, which is 2.31% higher than that of the conventional method.
{"title":"Facial-Expression Recognition from Video using Enhanced Convolutional LSTM","authors":"Ryo Miyoshi, N. Nagata, M. Hashimoto","doi":"10.1109/DICTA47822.2019.8946025","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946025","url":null,"abstract":"We propose an enhanced convolutional long short-term memory (ConvLSTM) algorithm, i.e., Enhanced ConvLSTM, by adding skip connections in the spatial and temporal directions to conventional ConvLSTM to suppress gradient vanishing and use older information. We also propose a method that uses this algorithm to automatically recognize facial expressions from videos. The proposed facial-expression recognition method consists of two Enhanced ConvLSTM streams and two ResNet streams. The Enhanced ConvLSTM streams extract features for fine movements, and the ResNet streams extract features for rough movements. In the Enhanced ConvLSTM streams, spatio-temporal features are extracted by stacking the Enhanced ConvLSTM. We conducted experiments to compare a method using ConvLSTM with skip connections (proposed Enhanced ConvLSTM) and a method without them (conventional ConvLSTM). A method using Enhanced CovnLSTM had a 4.44% higher accuracy than the a method using conventional ConvL-STM. Also the proposed facial-expression recognition method achieved 45.29% accuracy, which is 2.31% higher than that of the conventional method.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86725191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946007
R. Jony, A. Woodley, Dimitri Perrin
Images uploaded to social media platforms such as Twitter and Flickr have become a potential source of information about natural disasters. However, due to their lower reliability and noisy nature, it is challenging to automatically identify social media images that genuinely contain evidence of natural disasters. Visual features have been popular for classifying these images while the associated metadata are often ignored or exploited only to a limited extent. To test their potential, we employed them separately to identify social media images with flooding evidence. For visual feature extraction, we utilized three advanced Convolutional Neural Networks (CNNs) pre-trained on two different types of datasets and used a simple neural network for the classification. The results demonstrate that the combination of two types of visual features have a positive impact on distinguishing natural disaster images. From metadata, we considered only the textual metadata. Here, we combined all textual metadata and extracted bi-gram features. Then we employed a Support Vector Machine (SVM) for the classification task. The results show that the combination of the textual metadata can improve the classification accuracy compared to their individual counterparts. The results also demonstrate that although the visual feature approach outperforms metadata approach, metadata have certain capability to classify these images. For instance, the proposed visual feature approach achieved a similar result (MAP = 95.15) compared to the top visual feature approaches presented in MediaEval 2017, the metadata approach outperformed (MAP = 84.52) presented metadata methods. For the experiments, we utilized dataset from MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM) task and compared the achieved results with the other methods presented (Number of participants = 11) of the task.
上传到Twitter和Flickr等社交媒体平台上的图片已经成为有关自然灾害的潜在信息来源。然而,由于其可靠性较低和嘈杂的性质,自动识别真正包含自然灾害证据的社交媒体图像具有挑战性。视觉特征被广泛用于对这些图像进行分类,而相关的元数据往往被忽略或只在有限的程度上被利用。为了测试它们的潜力,我们分别使用它们来识别具有大量证据的社交媒体图像。对于视觉特征提取,我们使用了三个先进的卷积神经网络(cnn)在两种不同类型的数据集上进行预训练,并使用一个简单的神经网络进行分类。结果表明,两类视觉特征的结合对自然灾害图像的识别有积极的影响。从元数据来看,我们只考虑了文本元数据。在这里,我们结合了所有文本元数据并提取了双元图特征。然后我们使用支持向量机(SVM)进行分类任务。结果表明,文本元数据的组合比单独的文本元数据更能提高分类精度。结果还表明,虽然视觉特征方法优于元数据方法,但元数据对这些图像具有一定的分类能力。例如,与MediaEval 2017中提出的顶级视觉特征方法相比,所提出的视觉特征方法取得了相似的结果(MAP = 95.15),而元数据方法的表现优于所提出的元数据方法(MAP = 84.52)。实验中,我们使用MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM)任务的数据集,并将所获得的结果与该任务的其他方法(参与者数= 11)进行比较。
{"title":"Flood Detection in Social Media Images using Visual Features and Metadata","authors":"R. Jony, A. Woodley, Dimitri Perrin","doi":"10.1109/DICTA47822.2019.8946007","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946007","url":null,"abstract":"Images uploaded to social media platforms such as Twitter and Flickr have become a potential source of information about natural disasters. However, due to their lower reliability and noisy nature, it is challenging to automatically identify social media images that genuinely contain evidence of natural disasters. Visual features have been popular for classifying these images while the associated metadata are often ignored or exploited only to a limited extent. To test their potential, we employed them separately to identify social media images with flooding evidence. For visual feature extraction, we utilized three advanced Convolutional Neural Networks (CNNs) pre-trained on two different types of datasets and used a simple neural network for the classification. The results demonstrate that the combination of two types of visual features have a positive impact on distinguishing natural disaster images. From metadata, we considered only the textual metadata. Here, we combined all textual metadata and extracted bi-gram features. Then we employed a Support Vector Machine (SVM) for the classification task. The results show that the combination of the textual metadata can improve the classification accuracy compared to their individual counterparts. The results also demonstrate that although the visual feature approach outperforms metadata approach, metadata have certain capability to classify these images. For instance, the proposed visual feature approach achieved a similar result (MAP = 95.15) compared to the top visual feature approaches presented in MediaEval 2017, the metadata approach outperformed (MAP = 84.52) presented metadata methods. For the experiments, we utilized dataset from MediaEval 2017 Disaster Image Retrieval from Social Media (DIRSM) task and compared the achieved results with the other methods presented (Number of participants = 11) of the task.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"72 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84017791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In automatic cytology image diagnosis, the false-positive or false-negative often come up with inflammatory cells that obscure the identification of abnormal or normal cells. These phenotypes are presented in the similar appearance in shape, color and texture with cells to detect. In this paper, to evaluate the inflammation and eliminate their disturbances of recognizing cells of interests, we propose a two-stage framework containing a deep learning based neural network to detect and estimate the proportions of inflammatory cells, and a morphology based image processing architecture to eliminate them from the digital images with image inpainting. For performance evaluation, we apply the framework to our collected real-life clinical cytology slides presented with a variety of complexities. We evaluate the tests on sub-images cropped from 49 positive and 49 negative slides from different patients, each at the magnification rate of 40×. The experiments shows an accurate profile of the coverage of inflammation in the whole slide images, as well as their proportion in all the cells presented in the image. Confirmed by cytotechnologists, more than 96.0% of inflammatory cells are successfully detected at pixel level and well-inpainted in the cytology images without bringing new recognition problem.
{"title":"Assessment and Elimination of Inflammatory Cell: A Machine Learning Approach in Digital Cytology","authors":"Jing Ke, Junwei Deng, Yizhou Lu, Dadong Wang, Yang Song, Huijuan Zhang","doi":"10.1109/DICTA47822.2019.8946065","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946065","url":null,"abstract":"In automatic cytology image diagnosis, the false-positive or false-negative often come up with inflammatory cells that obscure the identification of abnormal or normal cells. These phenotypes are presented in the similar appearance in shape, color and texture with cells to detect. In this paper, to evaluate the inflammation and eliminate their disturbances of recognizing cells of interests, we propose a two-stage framework containing a deep learning based neural network to detect and estimate the proportions of inflammatory cells, and a morphology based image processing architecture to eliminate them from the digital images with image inpainting. For performance evaluation, we apply the framework to our collected real-life clinical cytology slides presented with a variety of complexities. We evaluate the tests on sub-images cropped from 49 positive and 49 negative slides from different patients, each at the magnification rate of 40×. The experiments shows an accurate profile of the coverage of inflammation in the whole slide images, as well as their proportion in all the cells presented in the image. Confirmed by cytotechnologists, more than 96.0% of inflammatory cells are successfully detected at pixel level and well-inpainted in the cytology images without bringing new recognition problem.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"20 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84075531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}