Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945818
Nicholas Burleigh, Jordan King, T. Bräunl
In this paper we look at Deep Learning methods using TensorFlow for autonomous driving tasks. Using scale model vehicles in a traffic scenario similar to the Audi Autonomous Driving Cup and the Carolo Cup, we successfully used Deep Learning stacks for the two independent tasks of lane keeping and traffic sign recognition.
{"title":"Deep Learning for Autonomous Driving","authors":"Nicholas Burleigh, Jordan King, T. Bräunl","doi":"10.1109/DICTA47822.2019.8945818","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945818","url":null,"abstract":"In this paper we look at Deep Learning methods using TensorFlow for autonomous driving tasks. Using scale model vehicles in a traffic scenario similar to the Audi Autonomous Driving Cup and the Carolo Cup, we successfully used Deep Learning stacks for the two independent tasks of lane keeping and traffic sign recognition.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"21 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87600317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946075
U. Somaratne, Kok Wai Wong, J. Parry, Ferdous Sohel, Xuequn Wang, Hamid Laga
Follicular Lymphoma (FL) is a type of lymphoma that grows silently and is usually diagnosed in its later stages. To increase the patients' survival rates, FL requires a fast diagnosis. While, traditionally, the diagnosis is performed by visual inspection of Whole Slide Images (WSI), recent advances in deep learning techniques provide an opportunity to automate this process. The main challenge, however, is that WSI images often exhibit large variations across different operating environments, hereinafter referred to as sites. As such, deep learning models usually require retraining using labeled data from each new site. This is, however, not feasible since the labelling process requires pathologists to visually inspect and label each sample. In this paper, we propose a deep learning model that uses transfer learning with fine-tuning to improve the identification of Follicular Lymphoma on images from new sites that are different from those used during training. Our results show that the proposed approach improves the prediction accuracy with 12% to 52% compared to the initial prediction of the model for images from a new site in the target environment.
{"title":"Improving Follicular Lymphoma Identification using the Class of Interest for Transfer Learning","authors":"U. Somaratne, Kok Wai Wong, J. Parry, Ferdous Sohel, Xuequn Wang, Hamid Laga","doi":"10.1109/DICTA47822.2019.8946075","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946075","url":null,"abstract":"Follicular Lymphoma (FL) is a type of lymphoma that grows silently and is usually diagnosed in its later stages. To increase the patients' survival rates, FL requires a fast diagnosis. While, traditionally, the diagnosis is performed by visual inspection of Whole Slide Images (WSI), recent advances in deep learning techniques provide an opportunity to automate this process. The main challenge, however, is that WSI images often exhibit large variations across different operating environments, hereinafter referred to as sites. As such, deep learning models usually require retraining using labeled data from each new site. This is, however, not feasible since the labelling process requires pathologists to visually inspect and label each sample. In this paper, we propose a deep learning model that uses transfer learning with fine-tuning to improve the identification of Follicular Lymphoma on images from new sites that are different from those used during training. Our results show that the proposed approach improves the prediction accuracy with 12% to 52% compared to the initial prediction of the model for images from a new site in the target environment.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90485127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946006
Hengjia Li, Chuong V. Nguyen
Image-based 3D reconstruction or 3D photogrammetry of small-scale objects including insects and biological specimens is challenging due to the use of high magnification lens with inherent limited depth of field, and the object's fine structures and complex surface properties. Due to these challenges, traditional 3D reconstruction techniques cannot be applied without suitable image pre-processings. One such preprocessing technique is multifocus stacking that combines a set of partially focused images captured from the same viewing angle to create a single in-focus image. Traditional multifocus image capture uses a camera on a macro rail. Furthermore, the scale and shift are not properly considered by multifocus stacking techniques. As a consequence, the resulting in-focus images contain artifacts that violate perspective image formation. A 3D reconstruction using such images will fail to produce an accurate 3D model of the object. This paper shows how this problem can be solved effectively by a new multifocus stacking procedure which includes a new Fixed-Lens Multifocus Capture and camera calibration for image scale and shift. Initial experimental results are presented to confirm our expectation and show that the camera poses of fixed-lens images are at least 3-times less noisy than those of conventional moving lens images.
{"title":"Perspective-Consistent Multifocus Multiview 3D Reconstruction of Small Objects","authors":"Hengjia Li, Chuong V. Nguyen","doi":"10.1109/DICTA47822.2019.8946006","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946006","url":null,"abstract":"Image-based 3D reconstruction or 3D photogrammetry of small-scale objects including insects and biological specimens is challenging due to the use of high magnification lens with inherent limited depth of field, and the object's fine structures and complex surface properties. Due to these challenges, traditional 3D reconstruction techniques cannot be applied without suitable image pre-processings. One such preprocessing technique is multifocus stacking that combines a set of partially focused images captured from the same viewing angle to create a single in-focus image. Traditional multifocus image capture uses a camera on a macro rail. Furthermore, the scale and shift are not properly considered by multifocus stacking techniques. As a consequence, the resulting in-focus images contain artifacts that violate perspective image formation. A 3D reconstruction using such images will fail to produce an accurate 3D model of the object. This paper shows how this problem can be solved effectively by a new multifocus stacking procedure which includes a new Fixed-Lens Multifocus Capture and camera calibration for image scale and shift. Initial experimental results are presented to confirm our expectation and show that the camera poses of fixed-lens images are at least 3-times less noisy than those of conventional moving lens images.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89370970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945816
Osama Rasheed, A. Rextin, Mehwish Nasim
With the increasing popularity of smartphones and its audience including children as young as 2 year old, smartphones can be a hazard for young children in terms of health concerns, time wastage, viewing of inappropriate material and conversely children who are too young can be a threat to the smartphone as well e.g, causing battery drainage, making unwanted calls/text messages, doing physical damage etc. In order to protect the smartphone and children from each other, we require user identification on our devices so the device could perform certain functions for instance restricting adult content once a user is identified as a child. This paper is a user study that aims at detecting the touch patterns of adults and children. To this end we collected data from 60 people, 30 adults and 30 children while they were asked to perform the 6 basic tasks that are performed on touch devices to find the differences in the touch gestures of children from adults. We first perform an exploratory data analysis. We then model the problem as a supervised binary classification problem and use the data as input for different machine learning algorithms to find whether we can classify a user previously unknown to the machine as an adult or a child. Our work shows there are differences in touch gestures among children and adults which are sufficient for user group identification.
{"title":"Adult or Child: Recognizing through Touch Gestures on Smartphones","authors":"Osama Rasheed, A. Rextin, Mehwish Nasim","doi":"10.1109/DICTA47822.2019.8945816","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945816","url":null,"abstract":"With the increasing popularity of smartphones and its audience including children as young as 2 year old, smartphones can be a hazard for young children in terms of health concerns, time wastage, viewing of inappropriate material and conversely children who are too young can be a threat to the smartphone as well e.g, causing battery drainage, making unwanted calls/text messages, doing physical damage etc. In order to protect the smartphone and children from each other, we require user identification on our devices so the device could perform certain functions for instance restricting adult content once a user is identified as a child. This paper is a user study that aims at detecting the touch patterns of adults and children. To this end we collected data from 60 people, 30 adults and 30 children while they were asked to perform the 6 basic tasks that are performed on touch devices to find the differences in the touch gestures of children from adults. We first perform an exploratory data analysis. We then model the problem as a supervised binary classification problem and use the data as input for different machine learning algorithms to find whether we can classify a user previously unknown to the machine as an adult or a child. Our work shows there are differences in touch gestures among children and adults which are sufficient for user group identification.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"250 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78358532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945835
G. Silva, Inês Domingues, Hugo Duarte, João A. M. Santos
Positron emission tomography (PET) imaging is a nuclear medicine functional imaging technique and as such it is expensive to perform and subjects the human body to radiation. Therefore, it would be ideal to find a technique that could allow for these images to be generated automatically. This generation can be done using deep learning techniques, more specifically with generative adversarial networks. As far as we are aware there have been no attempts at PET-to-PET generation to date. The objective of this article is to develop a generative adversarial network capable of generating after-treatment PET images from pre-treatment PET images. In order to develop this model, PET scans, originally in 3D, were converted to 2D images. Two methods were used, hand picking each slice and maximum intensity projection. After extracting the slices, several image co-registration techniques were applied in order to find which one would produce the best results according to two metrics, peak signal-to-noise ratio and structural similarity index. They achieved results of 18.8 and 0.856, respectively, using data from 90 patients with Hodgkin's Lymphoma.
{"title":"Automatic Generation of Lymphoma Post-Treatment PETs using Conditional-GANs","authors":"G. Silva, Inês Domingues, Hugo Duarte, João A. M. Santos","doi":"10.1109/DICTA47822.2019.8945835","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945835","url":null,"abstract":"Positron emission tomography (PET) imaging is a nuclear medicine functional imaging technique and as such it is expensive to perform and subjects the human body to radiation. Therefore, it would be ideal to find a technique that could allow for these images to be generated automatically. This generation can be done using deep learning techniques, more specifically with generative adversarial networks. As far as we are aware there have been no attempts at PET-to-PET generation to date. The objective of this article is to develop a generative adversarial network capable of generating after-treatment PET images from pre-treatment PET images. In order to develop this model, PET scans, originally in 3D, were converted to 2D images. Two methods were used, hand picking each slice and maximum intensity projection. After extracting the slices, several image co-registration techniques were applied in order to find which one would produce the best results according to two metrics, peak signal-to-noise ratio and structural similarity index. They achieved results of 18.8 and 0.856, respectively, using data from 90 patients with Hodgkin's Lymphoma.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74253307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945924
Elena M. Vella, Anee Azim, H. Gaetjens, Boris Repasky, Timothy Payne
Current vehicle detection and tracking in imagery characterised by large ground coverage, low resolution and low frame rate data, such as Wide Area Motion Imagery (WAMI), does not reliably sustain vehicle tracks through start-stop movement profiles. This limits the continuity of tracks and its usefulness in higher level analysis such as pattern of behaviour or activity analysis. We develop and implement a two-step registration method to create well-registered images which are used to generate a novel low-noise representation of the static background context which is fed into our Context Convolutional Neural Network (C-CNN) detector. This network is unique as the C-CCN learns changing features in the scene and thus produces reliable, sustained vehicle detection independent of motion. A quantitative evaluation against WAMI imagery is presented for a Region of Interest (ROI) of the WPAFB 2009 annotated dataset [1]. We apply a Kalman filter tracker with WAMI-specific adaptions to the single frame C-CNN detections, and evaluate the results with respect to the tracking ground truth. We show improved detection and sustained tracking in WAMI using static background contextual information and reliably detect all vehicles that move, including vehicles that become stationary for short periods of time as they move through stop-start manoeuvres.
{"title":"Improved Detection for WAMI using Background Contextual Information","authors":"Elena M. Vella, Anee Azim, H. Gaetjens, Boris Repasky, Timothy Payne","doi":"10.1109/DICTA47822.2019.8945924","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945924","url":null,"abstract":"Current vehicle detection and tracking in imagery characterised by large ground coverage, low resolution and low frame rate data, such as Wide Area Motion Imagery (WAMI), does not reliably sustain vehicle tracks through start-stop movement profiles. This limits the continuity of tracks and its usefulness in higher level analysis such as pattern of behaviour or activity analysis. We develop and implement a two-step registration method to create well-registered images which are used to generate a novel low-noise representation of the static background context which is fed into our Context Convolutional Neural Network (C-CNN) detector. This network is unique as the C-CCN learns changing features in the scene and thus produces reliable, sustained vehicle detection independent of motion. A quantitative evaluation against WAMI imagery is presented for a Region of Interest (ROI) of the WPAFB 2009 annotated dataset [1]. We apply a Kalman filter tracker with WAMI-specific adaptions to the single frame C-CNN detections, and evaluate the results with respect to the tracking ground truth. We show improved detection and sustained tracking in WAMI using static background contextual information and reliably detect all vehicles that move, including vehicles that become stationary for short periods of time as they move through stop-start manoeuvres.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"145 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80461478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945841
A. Johnston, G. Carneiro
Capturing large amounts of accurate and diverse 3D data for training is often time consuming and expensive, either requiring many hours of artist time to model each object, or to scan from real world objects using depth sensors or structure from motion techniques. To address this problem, we present a method for reconstructing 3D textured point clouds from single input images without any 3D ground truth training data. We recast the problem of 3D point cloud estimation as that of performing two separate processes, a novel view synthesis and a depth/shape estimation from the novel view images. To train our models we leverage the recent advances in deep generative modelling and self-supervised learning. We show that our method outperforms recent supervised methods, and achieves state of the art results when compared with another recently proposed unsupervised method. Furthermore, we show that our method is capable of recovering textural information which is often missing from many previous approaches that rely on supervision.
{"title":"Single View 3D Point Cloud Reconstruction using Novel View Synthesis and Self-Supervised Depth Estimation","authors":"A. Johnston, G. Carneiro","doi":"10.1109/DICTA47822.2019.8945841","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945841","url":null,"abstract":"Capturing large amounts of accurate and diverse 3D data for training is often time consuming and expensive, either requiring many hours of artist time to model each object, or to scan from real world objects using depth sensors or structure from motion techniques. To address this problem, we present a method for reconstructing 3D textured point clouds from single input images without any 3D ground truth training data. We recast the problem of 3D point cloud estimation as that of performing two separate processes, a novel view synthesis and a depth/shape estimation from the novel view images. To train our models we leverage the recent advances in deep generative modelling and self-supervised learning. We show that our method outperforms recent supervised methods, and achieves state of the art results when compared with another recently proposed unsupervised method. Furthermore, we show that our method is capable of recovering textural information which is often missing from many previous approaches that rely on supervision.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"75 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73299191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945925
Uzair Nadeem, Bennamoun, Ferdous Sohel, R. Togneri
Coral reefs are vital for marine ecosystem and fishing industry. Automatic classification of corals is essential for the preservation and study of coral reefs. However, significant intra-class variations and inter-class similarity among coral genera, as well as the challenges of underwater illumination present a great hindrance for the automatic classification. We propose an end-to-end trainable Deep Fusion Net for the classification of corals from two types of images. The network takes two simultaneous inputs of reflectance and fluorescence images. It is composed of three branches: Reflectance, Fluorescence and Integration. The branches are first trained individually and then fused together. Finally, the Deep Fusion Net is trained end-to-end for the classification of different coral genera and other non-coral classes. Experiments on the challenging Eliat Fluorescence Coral dataset show that the Deep Fusion net achieves superior classification accuracy compared to other methods.
{"title":"Deep Fusion Net for Coral Classification in Fluorescence and Reflectance Images","authors":"Uzair Nadeem, Bennamoun, Ferdous Sohel, R. Togneri","doi":"10.1109/DICTA47822.2019.8945925","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945925","url":null,"abstract":"Coral reefs are vital for marine ecosystem and fishing industry. Automatic classification of corals is essential for the preservation and study of coral reefs. However, significant intra-class variations and inter-class similarity among coral genera, as well as the challenges of underwater illumination present a great hindrance for the automatic classification. We propose an end-to-end trainable Deep Fusion Net for the classification of corals from two types of images. The network takes two simultaneous inputs of reflectance and fluorescence images. It is composed of three branches: Reflectance, Fluorescence and Integration. The branches are first trained individually and then fused together. Finally, the Deep Fusion Net is trained end-to-end for the classification of different coral genera and other non-coral classes. Experiments on the challenging Eliat Fluorescence Coral dataset show that the Deep Fusion net achieves superior classification accuracy compared to other methods.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"41 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77496942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946089
J. Kugelman, D. Alonso-Caneiro, Scott A. Read, Stephen J. Vincent, F. Chen, M. Collins
The segmentation of tissue layers in optical coherence tomography (OCT) images of the internal lining of the eye (the retina and choroid) is commonly performed for clinical and research purposes. However, manual segmentation of the numerous scans is time consuming, tedious and error-prone. Fortunately, machine learning-based automated approaches for image segmentation tasks are becoming more common. However, poor performance of these methods can result from a lack of quantity or diversity in the data used to train the models. Recently, generative adversarial networks (GANs) have demonstrated the ability to generate synthetic images, which may be useful for data augmentation purposes. Here, we propose the application of GANs to construct chorio-retinal patches from OCT images which may be used to augment data for a patch-based approach to boundary segmentation. Given the complexity of GAN training, a range of experiments are performed to optimize performance. We show that it is feasible to generate 32×32 versions of such patches that are visually indistinguishable from their real variants. In the best case, the segmentation performance utilizing solely synthetic data to train the model is nearly comparable to real data on all three layer boundaries of interest. The difference in mean absolute error for the inner boundary of the inner limiting membrane (ILM) [0.50 vs. 0.48 pixels], outer boundary of the retinal pigment epithelium (RPE) [0.48 vs. 0.44 pixels] and choroid-scleral interface (CSI) [4.42 vs. 4.00 pixels] shows the performance using synthetic data to be only marginally inferior. These findings highlight the potential use of GANs for data augmentation in future work with chorio-retinal OCT images.
光学相干断层扫描(OCT)图像中组织层的分割(视网膜和脉络膜)通常用于临床和研究目的。然而,手工分割大量的扫描是耗时的,繁琐的,容易出错。幸运的是,基于机器学习的图像分割任务自动化方法正变得越来越普遍。然而,这些方法的性能差可能是由于用于训练模型的数据缺乏数量或多样性。最近,生成对抗网络(GANs)已经证明了生成合成图像的能力,这可能对数据增强有用。在这里,我们提出应用gan从OCT图像中构建绒毛膜-视网膜斑块,这些斑块可用于增强数据,用于基于斑块的边界分割方法。考虑到GAN训练的复杂性,我们进行了一系列的实验来优化性能。我们证明了生成32×32版本的补丁是可行的,这些补丁在视觉上与它们的真实变体无法区分。在最好的情况下,仅使用合成数据来训练模型的分割性能几乎可以与所有三个感兴趣的层边界上的真实数据相媲美。内限制膜(ILM)内边界(0.50 vs. 0.48像素)、视网膜色素上皮(RPE)外边界(0.48 vs. 0.44像素)和脉膜-巩膜界面(CSI) [4.42 vs. 4.00像素]的平均绝对误差差异表明,使用合成数据的表现仅略差。这些发现强调了gan在未来绒毛膜-视网膜OCT图像数据增强方面的潜在应用。
{"title":"Constructing Synthetic Chorio-Retinal Patches using Generative Adversarial Networks","authors":"J. Kugelman, D. Alonso-Caneiro, Scott A. Read, Stephen J. Vincent, F. Chen, M. Collins","doi":"10.1109/DICTA47822.2019.8946089","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946089","url":null,"abstract":"The segmentation of tissue layers in optical coherence tomography (OCT) images of the internal lining of the eye (the retina and choroid) is commonly performed for clinical and research purposes. However, manual segmentation of the numerous scans is time consuming, tedious and error-prone. Fortunately, machine learning-based automated approaches for image segmentation tasks are becoming more common. However, poor performance of these methods can result from a lack of quantity or diversity in the data used to train the models. Recently, generative adversarial networks (GANs) have demonstrated the ability to generate synthetic images, which may be useful for data augmentation purposes. Here, we propose the application of GANs to construct chorio-retinal patches from OCT images which may be used to augment data for a patch-based approach to boundary segmentation. Given the complexity of GAN training, a range of experiments are performed to optimize performance. We show that it is feasible to generate 32×32 versions of such patches that are visually indistinguishable from their real variants. In the best case, the segmentation performance utilizing solely synthetic data to train the model is nearly comparable to real data on all three layer boundaries of interest. The difference in mean absolute error for the inner boundary of the inner limiting membrane (ILM) [0.50 vs. 0.48 pixels], outer boundary of the retinal pigment epithelium (RPE) [0.48 vs. 0.44 pixels] and choroid-scleral interface (CSI) [4.42 vs. 4.00 pixels] shows the performance using synthetic data to be only marginally inferior. These findings highlight the potential use of GANs for data augmentation in future work with chorio-retinal OCT images.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"153 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91473654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946002
M. Uzair, R. Brinkworth, A. Finn
Thermal infrared imaging is an effective modality for developing robust methods of small target detection at large distances. However, low target contrast and high background clutter are two main challenges that limit the detection performance. We present bio-inspired spatio-temporal pre-processing of infrared video frames to deal with such challenges. The neurons in the early vision system of small flying insects have remarkable capability for noise filtering, contrast enhancement, signal compression and clutter suppression. These neurons were computationally modeled previously in two stages using a combination of linear and non-linear processing layers. The first stage models the adaptive temporal filtering mechanisms of insect photoreceptor cells. It improves the signal-to-noise-ratio, enhances target background discrimination and expands the possible range of signal variability. The second stage models the spatio-temporal adaptive filtering in the large monopolar cells that remove redundancy and increase target contrast. To show the performance gain achieved by such bio-inspired preprocessing, we perform small target detection experiments on real world high bit-depth infrared video sequences. Results show that the early biological vision based pre-processing significantly improves the performance of four standard infrared small moving target detection techniques. Specifically, the spatio-temporal preprocessing increase the detection rate (at 10−5 false alarm rate) of the best performing method by 100% and by up to 630% for the other methods. Our results are indicative of the strong potential of the bio-processing for allowing systems to detect smaller targets at longer distances in more cluttered environments.
{"title":"Insect-Inspired Small Moving Target Enhancement in Infrared Videos","authors":"M. Uzair, R. Brinkworth, A. Finn","doi":"10.1109/DICTA47822.2019.8946002","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946002","url":null,"abstract":"Thermal infrared imaging is an effective modality for developing robust methods of small target detection at large distances. However, low target contrast and high background clutter are two main challenges that limit the detection performance. We present bio-inspired spatio-temporal pre-processing of infrared video frames to deal with such challenges. The neurons in the early vision system of small flying insects have remarkable capability for noise filtering, contrast enhancement, signal compression and clutter suppression. These neurons were computationally modeled previously in two stages using a combination of linear and non-linear processing layers. The first stage models the adaptive temporal filtering mechanisms of insect photoreceptor cells. It improves the signal-to-noise-ratio, enhances target background discrimination and expands the possible range of signal variability. The second stage models the spatio-temporal adaptive filtering in the large monopolar cells that remove redundancy and increase target contrast. To show the performance gain achieved by such bio-inspired preprocessing, we perform small target detection experiments on real world high bit-depth infrared video sequences. Results show that the early biological vision based pre-processing significantly improves the performance of four standard infrared small moving target detection techniques. Specifically, the spatio-temporal preprocessing increase the detection rate (at 10−5 false alarm rate) of the best performing method by 100% and by up to 630% for the other methods. Our results are indicative of the strong potential of the bio-processing for allowing systems to detect smaller targets at longer distances in more cluttered environments.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87356475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}