2019 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文中文

Deep Learning for Autonomous Driving 自动驾驶的深度学习

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945818

Nicholas Burleigh, Jordan King, T. Bräunl

In this paper we look at Deep Learning methods using TensorFlow for autonomous driving tasks. Using scale model vehicles in a traffic scenario similar to the Audi Autonomous Driving Cup and the Carolo Cup, we successfully used Deep Learning stacks for the two independent tasks of lane keeping and traffic sign recognition.

在本文中，我们研究了使用TensorFlow进行自动驾驶任务的深度学习方法。在类似于奥迪自动驾驶杯和卡罗罗杯的交通场景中使用比例模型车辆，我们成功地将深度学习堆栈用于车道保持和交通标志识别这两个独立任务。

引用次数: 3

Improving Follicular Lymphoma Identification using the Class of Interest for Transfer Learning 利用兴趣班进行迁移学习提高滤泡性淋巴瘤的识别

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946075

U. Somaratne, Kok Wai Wong, J. Parry, Ferdous Sohel, Xuequn Wang, Hamid Laga

Follicular Lymphoma (FL) is a type of lymphoma that grows silently and is usually diagnosed in its later stages. To increase the patients' survival rates, FL requires a fast diagnosis. While, traditionally, the diagnosis is performed by visual inspection of Whole Slide Images (WSI), recent advances in deep learning techniques provide an opportunity to automate this process. The main challenge, however, is that WSI images often exhibit large variations across different operating environments, hereinafter referred to as sites. As such, deep learning models usually require retraining using labeled data from each new site. This is, however, not feasible since the labelling process requires pathologists to visually inspect and label each sample. In this paper, we propose a deep learning model that uses transfer learning with fine-tuning to improve the identification of Follicular Lymphoma on images from new sites that are different from those used during training. Our results show that the proposed approach improves the prediction accuracy with 12% to 52% compared to the initial prediction of the model for images from a new site in the target environment.

滤泡性淋巴瘤(FL)是一种无声生长的淋巴瘤，通常在其晚期才被诊断出来。为了提高患者的生存率，FL需要快速诊断。传统上，诊断是通过对整个幻灯片图像(WSI)进行视觉检查来进行的，而深度学习技术的最新进展为自动化这一过程提供了机会。然而，主要的挑战是WSI图像在不同的操作环境(以下简称为站点)中经常表现出很大的变化。因此，深度学习模型通常需要使用来自每个新站点的标记数据进行再训练。然而，这是不可行的，因为标记过程需要病理学家目视检查和标记每个样本。在本文中，我们提出了一种深度学习模型，该模型使用迁移学习和微调来提高对来自不同于训练期间使用的新站点的图像的滤泡性淋巴瘤的识别。我们的研究结果表明，与模型的初始预测相比，该方法对目标环境中新位置的图像的预测精度提高了12%至52%。

{"title":"Improving Follicular Lymphoma Identification using the Class of Interest for Transfer Learning","authors":"U. Somaratne, Kok Wai Wong, J. Parry, Ferdous Sohel, Xuequn Wang, Hamid Laga","doi":"10.1109/DICTA47822.2019.8946075","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946075","url":null,"abstract":"Follicular Lymphoma (FL) is a type of lymphoma that grows silently and is usually diagnosed in its later stages. To increase the patients' survival rates, FL requires a fast diagnosis. While, traditionally, the diagnosis is performed by visual inspection of Whole Slide Images (WSI), recent advances in deep learning techniques provide an opportunity to automate this process. The main challenge, however, is that WSI images often exhibit large variations across different operating environments, hereinafter referred to as sites. As such, deep learning models usually require retraining using labeled data from each new site. This is, however, not feasible since the labelling process requires pathologists to visually inspect and label each sample. In this paper, we propose a deep learning model that uses transfer learning with fine-tuning to improve the identification of Follicular Lymphoma on images from new sites that are different from those used during training. Our results show that the proposed approach improves the prediction accuracy with 12% to 52% compared to the initial prediction of the model for images from a new site in the target environment.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90485127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Perspective-Consistent Multifocus Multiview 3D Reconstruction of Small Objects 视角一致的多焦点多视角小物体三维重建

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946006

Hengjia Li, Chuong V. Nguyen

Image-based 3D reconstruction or 3D photogrammetry of small-scale objects including insects and biological specimens is challenging due to the use of high magnification lens with inherent limited depth of field, and the object's fine structures and complex surface properties. Due to these challenges, traditional 3D reconstruction techniques cannot be applied without suitable image pre-processings. One such preprocessing technique is multifocus stacking that combines a set of partially focused images captured from the same viewing angle to create a single in-focus image. Traditional multifocus image capture uses a camera on a macro rail. Furthermore, the scale and shift are not properly considered by multifocus stacking techniques. As a consequence, the resulting in-focus images contain artifacts that violate perspective image formation. A 3D reconstruction using such images will fail to produce an accurate 3D model of the object. This paper shows how this problem can be solved effectively by a new multifocus stacking procedure which includes a new Fixed-Lens Multifocus Capture and camera calibration for image scale and shift. Initial experimental results are presented to confirm our expectation and show that the camera poses of fixed-lens images are at least 3-times less noisy than those of conventional moving lens images.

基于图像的三维重建或三维摄影测量包括昆虫和生物标本在内的小尺度物体，由于使用高倍率镜头，固有的景深有限，以及物体的精细结构和复杂的表面特性，具有挑战性。由于这些挑战，传统的三维重建技术如果没有适当的图像预处理就无法应用。其中一种预处理技术是多焦点叠加，它将从相同视角捕获的一组部分聚焦的图像组合在一起，形成一个单一的聚焦图像。传统的多焦点图像捕获使用微距导轨上的相机。此外，多焦点叠加技术没有很好地考虑尺度和位移。因此，产生的聚焦图像包含违反透视图像形成的伪影。使用这样的图像进行三维重建将无法产生物体的精确三维模型。本文介绍了如何通过一种新的多焦点叠加方法有效地解决这一问题，该方法包括一种新的固定镜头多焦点捕获和图像缩放和移位的相机校准。初步的实验结果证实了我们的预期，并表明固定镜头图像的相机姿态比传统的运动镜头图像至少少3倍的噪声。

{"title":"Perspective-Consistent Multifocus Multiview 3D Reconstruction of Small Objects","authors":"Hengjia Li, Chuong V. Nguyen","doi":"10.1109/DICTA47822.2019.8946006","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946006","url":null,"abstract":"Image-based 3D reconstruction or 3D photogrammetry of small-scale objects including insects and biological specimens is challenging due to the use of high magnification lens with inherent limited depth of field, and the object's fine structures and complex surface properties. Due to these challenges, traditional 3D reconstruction techniques cannot be applied without suitable image pre-processings. One such preprocessing technique is multifocus stacking that combines a set of partially focused images captured from the same viewing angle to create a single in-focus image. Traditional multifocus image capture uses a camera on a macro rail. Furthermore, the scale and shift are not properly considered by multifocus stacking techniques. As a consequence, the resulting in-focus images contain artifacts that violate perspective image formation. A 3D reconstruction using such images will fail to produce an accurate 3D model of the object. This paper shows how this problem can be solved effectively by a new multifocus stacking procedure which includes a new Fixed-Lens Multifocus Capture and camera calibration for image scale and shift. Initial experimental results are presented to confirm our expectation and show that the camera poses of fixed-lens images are at least 3-times less noisy than those of conventional moving lens images.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89370970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Adult or Child: Recognizing through Touch Gestures on Smartphones 成人或儿童:通过智能手机上的触摸手势识别

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945816

Osama Rasheed, A. Rextin, Mehwish Nasim

With the increasing popularity of smartphones and its audience including children as young as 2 year old, smartphones can be a hazard for young children in terms of health concerns, time wastage, viewing of inappropriate material and conversely children who are too young can be a threat to the smartphone as well e.g, causing battery drainage, making unwanted calls/text messages, doing physical damage etc. In order to protect the smartphone and children from each other, we require user identification on our devices so the device could perform certain functions for instance restricting adult content once a user is identified as a child. This paper is a user study that aims at detecting the touch patterns of adults and children. To this end we collected data from 60 people, 30 adults and 30 children while they were asked to perform the 6 basic tasks that are performed on touch devices to find the differences in the touch gestures of children from adults. We first perform an exploratory data analysis. We then model the problem as a supervised binary classification problem and use the data as input for different machine learning algorithms to find whether we can classify a user previously unknown to the machine as an adult or a child. Our work shows there are differences in touch gestures among children and adults which are sufficient for user group identification.

随着智能手机及其受众(包括2岁以下的儿童)的日益普及，智能手机在健康问题、浪费时间、观看不适当的材料方面对幼儿可能是一种危害，相反，年龄太小的儿童也可能对智能手机构成威胁，例如导致电池流失、拨打不必要的电话/短信、造成身体伤害等。为了保护智能手机和儿童，我们需要在我们的设备上进行用户识别，以便设备可以执行某些功能，例如，一旦用户被识别为儿童，设备就可以限制成人内容。本文是一项用户研究，旨在检测成人和儿童的触摸模式。为此，我们收集了60人的数据，30名成人和30名儿童，他们被要求在触摸设备上执行6项基本任务，以发现儿童和成人的触摸手势的差异。我们首先进行探索性数据分析。然后，我们将该问题建模为一个有监督的二元分类问题，并将数据作为不同机器学习算法的输入，以确定我们是否可以将机器之前未知的用户分类为成人或儿童。我们的研究表明，儿童和成人在触摸手势方面存在差异，这足以用于用户群体识别。

{"title":"Adult or Child: Recognizing through Touch Gestures on Smartphones","authors":"Osama Rasheed, A. Rextin, Mehwish Nasim","doi":"10.1109/DICTA47822.2019.8945816","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945816","url":null,"abstract":"With the increasing popularity of smartphones and its audience including children as young as 2 year old, smartphones can be a hazard for young children in terms of health concerns, time wastage, viewing of inappropriate material and conversely children who are too young can be a threat to the smartphone as well e.g, causing battery drainage, making unwanted calls/text messages, doing physical damage etc. In order to protect the smartphone and children from each other, we require user identification on our devices so the device could perform certain functions for instance restricting adult content once a user is identified as a child. This paper is a user study that aims at detecting the touch patterns of adults and children. To this end we collected data from 60 people, 30 adults and 30 children while they were asked to perform the 6 basic tasks that are performed on touch devices to find the differences in the touch gestures of children from adults. We first perform an exploratory data analysis. We then model the problem as a supervised binary classification problem and use the data as input for different machine learning algorithms to find whether we can classify a user previously unknown to the machine as an adult or a child. Our work shows there are differences in touch gestures among children and adults which are sufficient for user group identification.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"250 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78358532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Automatic Generation of Lymphoma Post-Treatment PETs using Conditional-GANs 使用条件gan自动生成淋巴瘤治疗后pet

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945835

G. Silva, Inês Domingues, Hugo Duarte, João A. M. Santos

Positron emission tomography (PET) imaging is a nuclear medicine functional imaging technique and as such it is expensive to perform and subjects the human body to radiation. Therefore, it would be ideal to find a technique that could allow for these images to be generated automatically. This generation can be done using deep learning techniques, more specifically with generative adversarial networks. As far as we are aware there have been no attempts at PET-to-PET generation to date. The objective of this article is to develop a generative adversarial network capable of generating after-treatment PET images from pre-treatment PET images. In order to develop this model, PET scans, originally in 3D, were converted to 2D images. Two methods were used, hand picking each slice and maximum intensity projection. After extracting the slices, several image co-registration techniques were applied in order to find which one would produce the best results according to two metrics, peak signal-to-noise ratio and structural similarity index. They achieved results of 18.8 and 0.856, respectively, using data from 90 patients with Hodgkin's Lymphoma.

正电子发射断层扫描(PET)成像是一种核医学功能成像技术，由于其昂贵的性能和对人体的辐射。因此，最好找到一种能够自动生成这些图像的技术。这一代可以使用深度学习技术来完成，更具体地说，是使用生成对抗网络。据我们所知，迄今为止还没有尝试过PET-to-PET代。本文的目的是开发一个生成对抗网络，能够从预处理PET图像生成后处理PET图像。为了开发这个模型，原本是3D的PET扫描被转换成2D图像。采用两种方法:手工采摘每个切片和最大强度投影。在提取图像切片后，根据峰值信噪比和结构相似度这两个指标，采用几种图像共配准技术，以找出哪一种技术能产生最好的结果。使用90例霍奇金淋巴瘤患者的数据，他们分别获得了18.8和0.856的结果。

{"title":"Automatic Generation of Lymphoma Post-Treatment PETs using Conditional-GANs","authors":"G. Silva, Inês Domingues, Hugo Duarte, João A. M. Santos","doi":"10.1109/DICTA47822.2019.8945835","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945835","url":null,"abstract":"Positron emission tomography (PET) imaging is a nuclear medicine functional imaging technique and as such it is expensive to perform and subjects the human body to radiation. Therefore, it would be ideal to find a technique that could allow for these images to be generated automatically. This generation can be done using deep learning techniques, more specifically with generative adversarial networks. As far as we are aware there have been no attempts at PET-to-PET generation to date. The objective of this article is to develop a generative adversarial network capable of generating after-treatment PET images from pre-treatment PET images. In order to develop this model, PET scans, originally in 3D, were converted to 2D images. Two methods were used, hand picking each slice and maximum intensity projection. After extracting the slices, several image co-registration techniques were applied in order to find which one would produce the best results according to two metrics, peak signal-to-noise ratio and structural similarity index. They achieved results of 18.8 and 0.856, respectively, using data from 90 patients with Hodgkin's Lymphoma.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74253307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improved Detection for WAMI using Background Contextual Information 利用背景上下文信息改进WAMI检测

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945924

Elena M. Vella, Anee Azim, H. Gaetjens, Boris Repasky, Timothy Payne

Current vehicle detection and tracking in imagery characterised by large ground coverage, low resolution and low frame rate data, such as Wide Area Motion Imagery (WAMI), does not reliably sustain vehicle tracks through start-stop movement profiles. This limits the continuity of tracks and its usefulness in higher level analysis such as pattern of behaviour or activity analysis. We develop and implement a two-step registration method to create well-registered images which are used to generate a novel low-noise representation of the static background context which is fed into our Context Convolutional Neural Network (C-CNN) detector. This network is unique as the C-CCN learns changing features in the scene and thus produces reliable, sustained vehicle detection independent of motion. A quantitative evaluation against WAMI imagery is presented for a Region of Interest (ROI) of the WPAFB 2009 annotated dataset [1]. We apply a Kalman filter tracker with WAMI-specific adaptions to the single frame C-CNN detections, and evaluate the results with respect to the tracking ground truth. We show improved detection and sustained tracking in WAMI using static background contextual information and reliably detect all vehicles that move, including vehicles that become stationary for short periods of time as they move through stop-start manoeuvres.

目前的车辆检测和跟踪图像的特点是大范围的地面覆盖，低分辨率和低帧率数据，如广域运动图像(WAMI)，不能可靠地通过启停运动轮廓来维持车辆轨迹。这限制了轨迹的连续性及其在高级分析(如行为模式或活动分析)中的有用性。我们开发并实现了一种两步配准方法来创建良好配准的图像，这些图像用于生成静态背景上下文的新型低噪声表示，并将其馈送到我们的上下文卷积神经网络(C-CNN)检测器中。该网络的独特之处在于，C-CCN学习了场景中不断变化的特征，从而产生了独立于运动的可靠、持续的车辆检测。针对WPAFB 2009注释数据集的感兴趣区域(ROI)，提出了针对WAMI图像的定量评估[1]。我们将具有wami特定适应性的卡尔曼滤波跟踪器应用于单帧C-CNN检测，并根据跟踪地真值对结果进行评估。我们展示了在WAMI中使用静态背景上下文信息改进的检测和持续跟踪，并可靠地检测所有移动的车辆，包括在停止-启动操作中短暂静止的车辆。

{"title":"Improved Detection for WAMI using Background Contextual Information","authors":"Elena M. Vella, Anee Azim, H. Gaetjens, Boris Repasky, Timothy Payne","doi":"10.1109/DICTA47822.2019.8945924","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945924","url":null,"abstract":"Current vehicle detection and tracking in imagery characterised by large ground coverage, low resolution and low frame rate data, such as Wide Area Motion Imagery (WAMI), does not reliably sustain vehicle tracks through start-stop movement profiles. This limits the continuity of tracks and its usefulness in higher level analysis such as pattern of behaviour or activity analysis. We develop and implement a two-step registration method to create well-registered images which are used to generate a novel low-noise representation of the static background context which is fed into our Context Convolutional Neural Network (C-CNN) detector. This network is unique as the C-CCN learns changing features in the scene and thus produces reliable, sustained vehicle detection independent of motion. A quantitative evaluation against WAMI imagery is presented for a Region of Interest (ROI) of the WPAFB 2009 annotated dataset [1]. We apply a Kalman filter tracker with WAMI-specific adaptions to the single frame C-CNN detections, and evaluate the results with respect to the tracking ground truth. We show improved detection and sustained tracking in WAMI using static background contextual information and reliably detect all vehicles that move, including vehicles that become stationary for short periods of time as they move through stop-start manoeuvres.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"145 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80461478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Single View 3D Point Cloud Reconstruction using Novel View Synthesis and Self-Supervised Depth Estimation 基于新颖视角合成和自监督深度估计的单视角三维点云重建

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945841

A. Johnston, G. Carneiro

Capturing large amounts of accurate and diverse 3D data for training is often time consuming and expensive, either requiring many hours of artist time to model each object, or to scan from real world objects using depth sensors or structure from motion techniques. To address this problem, we present a method for reconstructing 3D textured point clouds from single input images without any 3D ground truth training data. We recast the problem of 3D point cloud estimation as that of performing two separate processes, a novel view synthesis and a depth/shape estimation from the novel view images. To train our models we leverage the recent advances in deep generative modelling and self-supervised learning. We show that our method outperforms recent supervised methods, and achieves state of the art results when compared with another recently proposed unsupervised method. Furthermore, we show that our method is capable of recovering textural information which is often missing from many previous approaches that rely on supervision.

为训练捕获大量准确和多样化的3D数据通常是耗时和昂贵的，要么需要许多小时的艺术家时间来建模每个对象，要么使用深度传感器或运动技术结构从现实世界的对象进行扫描。为了解决这个问题，我们提出了一种从单输入图像重建三维纹理点云的方法，而不需要任何三维地面真值训练数据。我们将三维点云估计问题重新定义为执行两个独立的过程，即新视图合成和从新视图图像中进行深度/形状估计。为了训练我们的模型，我们利用了深度生成建模和自监督学习的最新进展。我们表明，我们的方法优于最近的监督方法，并且与最近提出的另一种无监督方法相比，达到了最先进的结果。此外，我们证明了我们的方法能够恢复纹理信息，而这些纹理信息通常是许多以前依赖于监督的方法所缺失的。

引用次数: 1

Deep Fusion Net for Coral Classification in Fluorescence and Reflectance Images 荧光和反射图像中珊瑚分类的深度融合网络

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8945925

Uzair Nadeem, Bennamoun, Ferdous Sohel, R. Togneri

Coral reefs are vital for marine ecosystem and fishing industry. Automatic classification of corals is essential for the preservation and study of coral reefs. However, significant intra-class variations and inter-class similarity among coral genera, as well as the challenges of underwater illumination present a great hindrance for the automatic classification. We propose an end-to-end trainable Deep Fusion Net for the classification of corals from two types of images. The network takes two simultaneous inputs of reflectance and fluorescence images. It is composed of three branches: Reflectance, Fluorescence and Integration. The branches are first trained individually and then fused together. Finally, the Deep Fusion Net is trained end-to-end for the classification of different coral genera and other non-coral classes. Experiments on the challenging Eliat Fluorescence Coral dataset show that the Deep Fusion net achieves superior classification accuracy compared to other methods.

珊瑚礁对海洋生态系统和渔业至关重要。珊瑚的自动分类对珊瑚礁的保存和研究至关重要。然而，珊瑚属之间的类内差异和类间相似性显著，以及水下光照的挑战，给自动分类带来了很大的阻碍。我们提出了一个端到端可训练的深度融合网络，用于从两种类型的图像中分类珊瑚。该网络同时输入反射率和荧光图像。它由三个分支组成:反射、荧光和集成。这些分支首先被单独训练，然后融合在一起。最后，对深度融合网络进行端到端训练，用于不同珊瑚属和其他非珊瑚类的分类。在具有挑战性的Eliat荧光珊瑚数据集上的实验表明，与其他方法相比，深度融合网络具有更高的分类精度。

引用次数: 3

Constructing Synthetic Chorio-Retinal Patches using Generative Adversarial Networks 利用生成对抗网络构建合成绒毛膜-视网膜补丁

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946089

J. Kugelman, D. Alonso-Caneiro, Scott A. Read, Stephen J. Vincent, F. Chen, M. Collins

The segmentation of tissue layers in optical coherence tomography (OCT) images of the internal lining of the eye (the retina and choroid) is commonly performed for clinical and research purposes. However, manual segmentation of the numerous scans is time consuming, tedious and error-prone. Fortunately, machine learning-based automated approaches for image segmentation tasks are becoming more common. However, poor performance of these methods can result from a lack of quantity or diversity in the data used to train the models. Recently, generative adversarial networks (GANs) have demonstrated the ability to generate synthetic images, which may be useful for data augmentation purposes. Here, we propose the application of GANs to construct chorio-retinal patches from OCT images which may be used to augment data for a patch-based approach to boundary segmentation. Given the complexity of GAN training, a range of experiments are performed to optimize performance. We show that it is feasible to generate 32×32 versions of such patches that are visually indistinguishable from their real variants. In the best case, the segmentation performance utilizing solely synthetic data to train the model is nearly comparable to real data on all three layer boundaries of interest. The difference in mean absolute error for the inner boundary of the inner limiting membrane (ILM) [0.50 vs. 0.48 pixels], outer boundary of the retinal pigment epithelium (RPE) [0.48 vs. 0.44 pixels] and choroid-scleral interface (CSI) [4.42 vs. 4.00 pixels] shows the performance using synthetic data to be only marginally inferior. These findings highlight the potential use of GANs for data augmentation in future work with chorio-retinal OCT images.

光学相干断层扫描(OCT)图像中组织层的分割(视网膜和脉络膜)通常用于临床和研究目的。然而，手工分割大量的扫描是耗时的，繁琐的，容易出错。幸运的是，基于机器学习的图像分割任务自动化方法正变得越来越普遍。然而，这些方法的性能差可能是由于用于训练模型的数据缺乏数量或多样性。最近，生成对抗网络(GANs)已经证明了生成合成图像的能力，这可能对数据增强有用。在这里，我们提出应用gan从OCT图像中构建绒毛膜-视网膜斑块，这些斑块可用于增强数据，用于基于斑块的边界分割方法。考虑到GAN训练的复杂性，我们进行了一系列的实验来优化性能。我们证明了生成32×32版本的补丁是可行的，这些补丁在视觉上与它们的真实变体无法区分。在最好的情况下，仅使用合成数据来训练模型的分割性能几乎可以与所有三个感兴趣的层边界上的真实数据相媲美。内限制膜(ILM)内边界(0.50 vs. 0.48像素)、视网膜色素上皮(RPE)外边界(0.48 vs. 0.44像素)和脉膜-巩膜界面(CSI) [4.42 vs. 4.00像素]的平均绝对误差差异表明，使用合成数据的表现仅略差。这些发现强调了gan在未来绒毛膜-视网膜OCT图像数据增强方面的潜在应用。

{"title":"Constructing Synthetic Chorio-Retinal Patches using Generative Adversarial Networks","authors":"J. Kugelman, D. Alonso-Caneiro, Scott A. Read, Stephen J. Vincent, F. Chen, M. Collins","doi":"10.1109/DICTA47822.2019.8946089","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946089","url":null,"abstract":"The segmentation of tissue layers in optical coherence tomography (OCT) images of the internal lining of the eye (the retina and choroid) is commonly performed for clinical and research purposes. However, manual segmentation of the numerous scans is time consuming, tedious and error-prone. Fortunately, machine learning-based automated approaches for image segmentation tasks are becoming more common. However, poor performance of these methods can result from a lack of quantity or diversity in the data used to train the models. Recently, generative adversarial networks (GANs) have demonstrated the ability to generate synthetic images, which may be useful for data augmentation purposes. Here, we propose the application of GANs to construct chorio-retinal patches from OCT images which may be used to augment data for a patch-based approach to boundary segmentation. Given the complexity of GAN training, a range of experiments are performed to optimize performance. We show that it is feasible to generate 32×32 versions of such patches that are visually indistinguishable from their real variants. In the best case, the segmentation performance utilizing solely synthetic data to train the model is nearly comparable to real data on all three layer boundaries of interest. The difference in mean absolute error for the inner boundary of the inner limiting membrane (ILM) [0.50 vs. 0.48 pixels], outer boundary of the retinal pigment epithelium (RPE) [0.48 vs. 0.44 pixels] and choroid-scleral interface (CSI) [4.42 vs. 4.00 pixels] shows the performance using synthetic data to be only marginally inferior. These findings highlight the potential use of GANs for data augmentation in future work with chorio-retinal OCT images.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"153 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91473654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Insect-Inspired Small Moving Target Enhancement in Infrared Videos 红外视频中受昆虫启发的小移动目标增强

2019 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2019-12-01 DOI: 10.1109/DICTA47822.2019.8946002

M. Uzair, R. Brinkworth, A. Finn

Thermal infrared imaging is an effective modality for developing robust methods of small target detection at large distances. However, low target contrast and high background clutter are two main challenges that limit the detection performance. We present bio-inspired spatio-temporal pre-processing of infrared video frames to deal with such challenges. The neurons in the early vision system of small flying insects have remarkable capability for noise filtering, contrast enhancement, signal compression and clutter suppression. These neurons were computationally modeled previously in two stages using a combination of linear and non-linear processing layers. The first stage models the adaptive temporal filtering mechanisms of insect photoreceptor cells. It improves the signal-to-noise-ratio, enhances target background discrimination and expands the possible range of signal variability. The second stage models the spatio-temporal adaptive filtering in the large monopolar cells that remove redundancy and increase target contrast. To show the performance gain achieved by such bio-inspired preprocessing, we perform small target detection experiments on real world high bit-depth infrared video sequences. Results show that the early biological vision based pre-processing significantly improves the performance of four standard infrared small moving target detection techniques. Specifically, the spatio-temporal preprocessing increase the detection rate (at 10−5 false alarm rate) of the best performing method by 100% and by up to 630% for the other methods. Our results are indicative of the strong potential of the bio-processing for allowing systems to detect smaller targets at longer distances in more cluttered environments.

热红外成像是开发大距离小目标鲁棒检测方法的有效方式。然而，低目标对比度和高背景杂波是限制检测性能的两个主要挑战。我们提出了一种仿生红外视频帧的时空预处理方法来应对这些挑战。小飞虫早期视觉系统的神经元具有显著的噪声滤波、对比度增强、信号压缩和杂波抑制能力。这些神经元在之前使用线性和非线性处理层的组合分两个阶段进行计算建模。第一阶段模拟昆虫感光细胞的自适应时间过滤机制。它提高了信噪比，增强了目标背景辨别能力，扩大了信号变异性的可能范围。第二阶段在大单极细胞中模拟时空自适应滤波，消除冗余，提高目标对比度。为了展示这种仿生预处理所获得的性能增益，我们在真实世界的高位深红外视频序列上进行了小目标检测实验。结果表明，基于早期生物视觉的预处理显著提高了四种标准红外小运动目标检测技术的性能。具体来说，时空预处理将表现最好的方法的检测率(虚警率为10−5)提高了100%，其他方法的检测率提高了630%。我们的结果表明了生物处理的强大潜力，允许系统在更混乱的环境中检测更远距离的较小目标。

{"title":"Insect-Inspired Small Moving Target Enhancement in Infrared Videos","authors":"M. Uzair, R. Brinkworth, A. Finn","doi":"10.1109/DICTA47822.2019.8946002","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946002","url":null,"abstract":"Thermal infrared imaging is an effective modality for developing robust methods of small target detection at large distances. However, low target contrast and high background clutter are two main challenges that limit the detection performance. We present bio-inspired spatio-temporal pre-processing of infrared video frames to deal with such challenges. The neurons in the early vision system of small flying insects have remarkable capability for noise filtering, contrast enhancement, signal compression and clutter suppression. These neurons were computationally modeled previously in two stages using a combination of linear and non-linear processing layers. The first stage models the adaptive temporal filtering mechanisms of insect photoreceptor cells. It improves the signal-to-noise-ratio, enhances target background discrimination and expands the possible range of signal variability. The second stage models the spatio-temporal adaptive filtering in the large monopolar cells that remove redundancy and increase target contrast. To show the performance gain achieved by such bio-inspired preprocessing, we perform small target detection experiments on real world high bit-depth infrared video sequences. Results show that the early biological vision based pre-processing significantly improves the performance of four standard infrared small moving target detection techniques. Specifically, the spatio-temporal preprocessing increase the detection rate (at 10−5 false alarm rate) of the best performing method by 100% and by up to 630% for the other methods. Our results are indicative of the strong potential of the bio-processing for allowing systems to detect smaller targets at longer distances in more cluttered environments.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87356475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 Digital Image Computing: Techniques and Applications (DICTA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀