首页 > 最新文献

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)最新文献

英文 中文
Real Time Ray Tracing of Analytic and Implicit Surfaces 解析曲面和隐式曲面的实时光线追踪
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290653
Finn Petrie, S. Mills
Real-time ray-tracing debuted to consumer GPU hardware in 2018. Primary examples however, have been of hybrid raster and ray-tracing methods that are restricted to triangle mesh geometry. Our research looks at the viability of procedural methods in the real-time setting. We give implementations of analytical and implicit geometry in the domain of the global illumination algorithms bi-directional path-tracing, and GPU Photon-Mapping – both of which we have adapted to the new ray-tracing shader stages, as shown in Figure 1. Despite procedural intersections being more expensive than triangle intersections in Nvidia’s RTX hardware, our results show that these descriptions still run at interactive rates within computationally expensive multi-pass ray-traced global illumination and demonstrate the practical benefits of the geometry.
实时光线追踪于2018年首次亮相消费类GPU硬件。然而,主要的例子是混合光栅和光线追踪方法,这些方法仅限于三角形网格几何。我们的研究着眼于程序方法在实时环境中的可行性。我们给出了全局照明算法领域的解析几何和隐式几何的实现,双向路径跟踪和GPU光子映射-我们已经适应了新的光线跟踪着色器阶段,如图1所示。尽管在Nvidia的RTX硬件中,程序交叉点比三角形交叉点更昂贵,但我们的结果表明,这些描述仍然在计算昂贵的多通道光线跟踪全局照明中以交互速率运行,并展示了几何图形的实际优势。
{"title":"Real Time Ray Tracing of Analytic and Implicit Surfaces","authors":"Finn Petrie, S. Mills","doi":"10.1109/IVCNZ51579.2020.9290653","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290653","url":null,"abstract":"Real-time ray-tracing debuted to consumer GPU hardware in 2018. Primary examples however, have been of hybrid raster and ray-tracing methods that are restricted to triangle mesh geometry. Our research looks at the viability of procedural methods in the real-time setting. We give implementations of analytical and implicit geometry in the domain of the global illumination algorithms bi-directional path-tracing, and GPU Photon-Mapping – both of which we have adapted to the new ray-tracing shader stages, as shown in Figure 1. Despite procedural intersections being more expensive than triangle intersections in Nvidia’s RTX hardware, our results show that these descriptions still run at interactive rates within computationally expensive multi-pass ray-traced global illumination and demonstrate the practical benefits of the geometry.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123748453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Introducing Transfer Leaming to 3D ResNet-18 for Alzheimer’s Disease Detection on MRI Images 将迁移学习引入3D ResNet-18,用于MRI图像上的阿尔茨海默病检测
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290616
Amir Ebrahimi, S. Luo, R. Chiong
This paper focuses on detecting Alzheimer’s Disease (AD) using the ResNet-18 model on Magnetic Resonance Imaging (MRI). Previous studies have applied different 2D Convolutional Neural Networks (CNNs) to detect AD. The main idea being to split 3D MRI scans into 2D image slices, so that classification can be performed on the image slices independently. This idea allows researchers to benefit from the concept of transfer learning. However, 2D CNNs are incapable of understanding the relationship among 2D image slices in a 3D MRI scan. One solution is to employ 3D CNNs instead of 2D ones. In this paper, we propose a method to utilise transfer learning in 3D CNNs, which allows the transfer of knowledge from 2D image datasets to a 3D image dataset. Both 2D and 3D CNNs are compared in this study, and our results show that introducing transfer learning to a 3D CNN improves the accuracy of an AD detection system. After using an optimisation method in the training process, our approach achieved 96.88% accuracy, 100% sensitivity, and 93.75% specificity.
本文主要研究利用磁共振成像(MRI)的ResNet-18模型检测阿尔茨海默病(AD)。以前的研究已经使用了不同的二维卷积神经网络(cnn)来检测AD。其主要思想是将3D MRI扫描图像分割成2D图像切片,这样就可以在图像切片上独立进行分类。这个想法使研究人员受益于迁移学习的概念。然而,2D cnn无法理解3D MRI扫描中2D图像切片之间的关系。一种解决方案是使用3D cnn而不是2D cnn。在本文中,我们提出了一种在3D cnn中利用迁移学习的方法,该方法允许将知识从2D图像数据集转移到3D图像数据集。在本研究中,我们对2D和3D CNN进行了比较,结果表明,将迁移学习引入3D CNN可以提高AD检测系统的准确性。在训练过程中使用优化方法后,我们的方法准确率达到96.88%,灵敏度为100%,特异性为93.75%。
{"title":"Introducing Transfer Leaming to 3D ResNet-18 for Alzheimer’s Disease Detection on MRI Images","authors":"Amir Ebrahimi, S. Luo, R. Chiong","doi":"10.1109/IVCNZ51579.2020.9290616","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290616","url":null,"abstract":"This paper focuses on detecting Alzheimer’s Disease (AD) using the ResNet-18 model on Magnetic Resonance Imaging (MRI). Previous studies have applied different 2D Convolutional Neural Networks (CNNs) to detect AD. The main idea being to split 3D MRI scans into 2D image slices, so that classification can be performed on the image slices independently. This idea allows researchers to benefit from the concept of transfer learning. However, 2D CNNs are incapable of understanding the relationship among 2D image slices in a 3D MRI scan. One solution is to employ 3D CNNs instead of 2D ones. In this paper, we propose a method to utilise transfer learning in 3D CNNs, which allows the transfer of knowledge from 2D image datasets to a 3D image dataset. Both 2D and 3D CNNs are compared in this study, and our results show that introducing transfer learning to a 3D CNN improves the accuracy of an AD detection system. After using an optimisation method in the training process, our approach achieved 96.88% accuracy, 100% sensitivity, and 93.75% specificity.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"558 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116275792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
A Graph-Based Approach to Automatic Convolutional Neural Network Construction for Image Classification 基于图的图像分类自动卷积神经网络构建方法
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290492
Gonglin Yuan, Bing Xue, Mengjie Zhang
Convolutional neural networks (CNNs) have achieved great success in the image classification field in recent years. Usually, human experts are needed to design the architectures of CNNs for different tasks. Evolutionary neural network architecture search could find optimal CNN architectures automatically. However, the previous representations of CNN architectures with evolutionary algorithms have many restrictions. In this paper, we propose a new flexible representation based on the directed acyclic graph to encode CNN architectures, to develop a genetic algorithm (GA) based evolutionary neural network architecture, where the depth of candidate CNNs could be variable. Furthermore, we design new crossover and mutation operators, which can be performed on individuals of different lengths. The proposed algorithm is evaluated on five widely used datasets. The experimental results show that the proposed algorithm achieves very competitive performance against its peer competitors in terms of the classification accuracy and number of parameters.
卷积神经网络(cnn)近年来在图像分类领域取得了巨大的成功。通常,需要人类专家为不同的任务设计cnn的架构。进化神经网络架构搜索可以自动找到最优的CNN架构。然而,以前用进化算法表示的CNN架构有很多限制。在本文中,我们提出了一种新的基于有向无环图的灵活表示来编码CNN架构,以开发基于遗传算法(GA)的进化神经网络架构,其中候选CNN的深度可以是可变的。此外,我们设计了新的交叉和变异算子,可以在不同长度的个体上执行。在五个广泛使用的数据集上对该算法进行了评估。实验结果表明,该算法在分类精度和参数数量方面都取得了较好的成绩。
{"title":"A Graph-Based Approach to Automatic Convolutional Neural Network Construction for Image Classification","authors":"Gonglin Yuan, Bing Xue, Mengjie Zhang","doi":"10.1109/IVCNZ51579.2020.9290492","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290492","url":null,"abstract":"Convolutional neural networks (CNNs) have achieved great success in the image classification field in recent years. Usually, human experts are needed to design the architectures of CNNs for different tasks. Evolutionary neural network architecture search could find optimal CNN architectures automatically. However, the previous representations of CNN architectures with evolutionary algorithms have many restrictions. In this paper, we propose a new flexible representation based on the directed acyclic graph to encode CNN architectures, to develop a genetic algorithm (GA) based evolutionary neural network architecture, where the depth of candidate CNNs could be variable. Furthermore, we design new crossover and mutation operators, which can be performed on individuals of different lengths. The proposed algorithm is evaluated on five widely used datasets. The experimental results show that the proposed algorithm achieves very competitive performance against its peer competitors in terms of the classification accuracy and number of parameters.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shadow-based Light Detection for HDR Environment Maps 基于阴影的HDR环境地图光检测
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290734
Andrew Chalmers, Taehyun Rhee
High dynamic range (HDR) environment maps (EMs) are spherical textures containing HDR pixels used for illuminating virtual scenes with high realism. Detecting as few necessary pixels as possible within the EM is important for a variety of tasks, such as real-time rendering and EM database management. To address this, we propose a shadow-based algorithm for detecting the most dominant light sources within an EM. This algorithm takes into account the relative impact of all other light sources within the upper-hemisphere of the texture. This is achieved by decomposing an EM into superpixels, sorting the superpixels from brightest to least, and using ℓ0-norm minimisation to keep only the necessary superpixels that maintains the shadow quality of the EM with respect to the just noticeable difference (JND) principle. We show that our method improves upon prior methods in detecting as few lights as possible while still preserving the shadow-casting properties of EMs.
高动态范围(HDR)环境地图(EMs)是包含HDR像素的球形纹理,用于照亮具有高真实感的虚拟场景。在EM中检测尽可能少的必要像素对于各种任务(如实时渲染和EM数据库管理)都很重要。为了解决这个问题,我们提出了一种基于阴影的算法来检测EM中最主要的光源。该算法考虑了纹理上半球内所有其他光源的相对影响。这是通过将EM分解为超像素来实现的,将超像素从最亮到最小排序,并使用0范数最小化来只保留必要的超像素,这些超像素可以保持EM的阴影质量,相对于可注意差异(JND)原则。我们表明,我们的方法改进了以前的方法,在检测尽可能少的光的同时仍然保留了em的阴影投射特性。
{"title":"Shadow-based Light Detection for HDR Environment Maps","authors":"Andrew Chalmers, Taehyun Rhee","doi":"10.1109/IVCNZ51579.2020.9290734","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290734","url":null,"abstract":"High dynamic range (HDR) environment maps (EMs) are spherical textures containing HDR pixels used for illuminating virtual scenes with high realism. Detecting as few necessary pixels as possible within the EM is important for a variety of tasks, such as real-time rendering and EM database management. To address this, we propose a shadow-based algorithm for detecting the most dominant light sources within an EM. This algorithm takes into account the relative impact of all other light sources within the upper-hemisphere of the texture. This is achieved by decomposing an EM into superpixels, sorting the superpixels from brightest to least, and using ℓ0-norm minimisation to keep only the necessary superpixels that maintains the shadow quality of the EM with respect to the just noticeable difference (JND) principle. We show that our method improves upon prior methods in detecting as few lights as possible while still preserving the shadow-casting properties of EMs.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127445477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Wavefront Sensorless Tip/Tilt Removal method for Correcting Astronomical Images 一种用于天文图像校正的无波前传感器尖端/倾斜去除方法
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290688
P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell
Images of astronomical objects captured by ground-based telescopes are distorted due to atmospheric turbulence. The phase of the atmospheric aberration is traditionally estimated by a wavefront sensor (WFS). This information is utilised by a deformable mirror through a control system to restore the image. However, in this paper, we utilise wavefront sensorless (WFSL) methods in which the wavefront sensor is absent. Given that the largest share of atmospheric turbulence energy is contained in the 2-axial tilt for small aperture telescopes, we use WFSL to specifically remove these two modes. This method is shown to be efficient in terms of both speed and accuracy.
由于大气湍流的影响,地面望远镜拍摄到的天体图像会发生扭曲。传统的大气像差相位估计方法是利用波前传感器(WFS)。可变形镜通过控制系统利用这些信息来恢复图像。然而,在本文中,我们利用无波前传感器(WFSL)方法,其中没有波前传感器。考虑到大气湍流能量的最大份额包含在小口径望远镜的2轴倾斜中,我们使用WFSL专门去除这两种模式。这种方法在速度和准确性方面都是有效的。
{"title":"A Wavefront Sensorless Tip/Tilt Removal method for Correcting Astronomical Images","authors":"P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290688","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290688","url":null,"abstract":"Images of astronomical objects captured by ground-based telescopes are distorted due to atmospheric turbulence. The phase of the atmospheric aberration is traditionally estimated by a wavefront sensor (WFS). This information is utilised by a deformable mirror through a control system to restore the image. However, in this paper, we utilise wavefront sensorless (WFSL) methods in which the wavefront sensor is absent. Given that the largest share of atmospheric turbulence energy is contained in the 2-axial tilt for small aperture telescopes, we use WFSL to specifically remove these two modes. This method is shown to be efficient in terms of both speed and accuracy.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131036719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AI in Photography: Scrutinizing Implementation of Super-Resolution Techniques in Photo-Editors 摄影中的人工智能:在照片编辑器中仔细检查超分辨率技术的实现
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290737
Noor-ul-ain Fatima
Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. as some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, I prioritized resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining high-resolution images from their low-resolution counterparts. In my work, image super-resolution techniques like Interpolation, SRCNN (Super-Resolution Convolutional Neural Network), SRResNet (Super Resolution Residual Network), and GANs (Generative Adversarial Networks: Super-Resolution GAN-SRGAN and Conditional GAN-CGAN) were studied experimentally for post-enhancement of images in photography as employed by photo-editors, establishing the most coherent approach for attaining optimized super-resolution in terms of quality.
从摄影师的角度判断照片的质量,我们可以确定分辨率,对称性,内容,位置等,作为影响照片熟练程度的一些因素。对摄影的吸引力呈指数级增长,促使我们根据上述参数寻找完善输入图像的方法。如果内容和位置是不可变的,那么对称性和分辨率等属性就可以发挥作用。在本文中,我优先考虑分辨率作为我们的标准,并且可以有多种方法来改进它。图像超分辨率正逐渐成为计算机图形学、计算机视觉和图像处理领域的先决条件。这是一个从低分辨率图像中获得高分辨率图像的过程。在我的工作中,图像超分辨率技术,如插值、SRCNN(超分辨率卷积神经网络)、SRResNet(超分辨率残差网络)和gan(生成对抗网络:超分辨率GAN-SRGAN和条件GAN-CGAN)被实验研究,用于照片编辑器使用的摄影图像的后期增强,建立了在质量方面获得优化超分辨率的最连贯的方法。
{"title":"AI in Photography: Scrutinizing Implementation of Super-Resolution Techniques in Photo-Editors","authors":"Noor-ul-ain Fatima","doi":"10.1109/IVCNZ51579.2020.9290737","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290737","url":null,"abstract":"Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. as some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, I prioritized resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining high-resolution images from their low-resolution counterparts. In my work, image super-resolution techniques like Interpolation, SRCNN (Super-Resolution Convolutional Neural Network), SRResNet (Super Resolution Residual Network), and GANs (Generative Adversarial Networks: Super-Resolution GAN-SRGAN and Conditional GAN-CGAN) were studied experimentally for post-enhancement of images in photography as employed by photo-editors, establishing the most coherent approach for attaining optimized super-resolution in terms of quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Variational Autoencoder for 3D Voxel Compression 用于3D体素压缩的变分自编码器
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290656
Juncheng Liu, S. Mills, B. McCane
3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.
三维场景感知和理解是计算机视觉和机器人领域的一项基本任务。一个广泛使用的3D数据表示是体素网格。然而,3D体素的显式表示总是需要很大的存储空间,这并不适合轻型应用和机器人导航和探索等场景。在本文中,我们提出了一种使用八叉树表示和变分自编码器(VAEs)压缩三维体素网格的方法。我们首先在我们的应用程序中使用协作的Realsense D435和T265相机捕获3D体素网格。体素网格被分解成三种类型的八位数,然后由编码器压缩,并通过将潜在代码输入解码器来复制。通过场景重建和路径规划两个应用,验证了该方法的有效性。
{"title":"Variational Autoencoder for 3D Voxel Compression","authors":"Juncheng Liu, S. Mills, B. McCane","doi":"10.1109/IVCNZ51579.2020.9290656","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290656","url":null,"abstract":"3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Comparison of Face Detection Algorithms on Mobile Devices 移动设备上人脸检测算法的比较
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290542
Yishi Guo, B. Wünsche
Face detection is a fundamental task for many computer vision applications such as access control, security, advertisement, automatic payment, and healthcare. Due to technological advances mobile robots are becoming increasingly common in such applications (e.g. healthcare and security robots) and consequently there is a need for efficient and effective face detection methods on such platforms. Mobile robots have different hardware configurations and operating conditions from desktop applications, e.g. unreliable network connections and the need for lower power consumption. Hence results for face detection methods on desktop platforms cannot be directly translated to mobile platforms.We compare four common face detection algorithms, Viola-Jones, HOG, MTCNN and MobileNet-SSD, for use in mobile robotics using different face data bases. Our results show that for a typical mobile configuration (Nvidia Jetson TX2) Mobile-NetSSD performed best with 90% detection accuracy for the AFW data set and a frame rate of almost 10 fps with GPU acceleration. MTCNN had the highest precision and was superior for more difficult face data sets, but did not achieve real-time performance with the given implementation and hardware configuration.
人脸检测是许多计算机视觉应用的基本任务,如访问控制、安全、广告、自动支付和医疗保健。由于技术的进步,移动机器人在这些应用中变得越来越普遍(例如医疗保健和安全机器人),因此需要在这些平台上高效和有效的面部检测方法。移动机器人的硬件配置和操作条件与桌面应用程序不同,例如不可靠的网络连接和对低功耗的需求。因此,桌面平台上人脸检测方法的结果不能直接转化到移动平台上。我们比较了四种常见的面部检测算法,Viola-Jones, HOG, MTCNN和MobileNet-SSD,用于使用不同面部数据库的移动机器人。我们的研究结果表明,对于典型的移动配置(Nvidia Jetson TX2), mobile - netssd在AFW数据集上表现最佳,检测准确率为90%,在GPU加速下帧率接近10 fps。MTCNN具有最高的精度,并且对于更困难的人脸数据集具有优势,但在给定的实现和硬件配置下无法实现实时性能。
{"title":"Comparison of Face Detection Algorithms on Mobile Devices","authors":"Yishi Guo, B. Wünsche","doi":"10.1109/IVCNZ51579.2020.9290542","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290542","url":null,"abstract":"Face detection is a fundamental task for many computer vision applications such as access control, security, advertisement, automatic payment, and healthcare. Due to technological advances mobile robots are becoming increasingly common in such applications (e.g. healthcare and security robots) and consequently there is a need for efficient and effective face detection methods on such platforms. Mobile robots have different hardware configurations and operating conditions from desktop applications, e.g. unreliable network connections and the need for lower power consumption. Hence results for face detection methods on desktop platforms cannot be directly translated to mobile platforms.We compare four common face detection algorithms, Viola-Jones, HOG, MTCNN and MobileNet-SSD, for use in mobile robotics using different face data bases. Our results show that for a typical mobile configuration (Nvidia Jetson TX2) Mobile-NetSSD performed best with 90% detection accuracy for the AFW data set and a frame rate of almost 10 fps with GPU acceleration. MTCNN had the highest precision and was superior for more difficult face data sets, but did not achieve real-time performance with the given implementation and hardware configuration.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"425 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132234027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predicting Cherry Quality Using Siamese Networks 利用暹罗网络预测樱桃品质
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290674
Yerren van Sint Annaland, Lech Szymanski, S. Mills
The cherry industry is a rapidly growing sector of New Zealand’s export merchandise and, as such, the accuracy with which pack-houses can grade cherries during processing is becoming increasingly critical. Conventional computer vision systems are usually employed in this process, yet they fall short in many respects, still requiring humans to manually verify the grading. In this work, we investigate the use of deep learning to improve upon the traditional approach. The nature of the industry means that the grade standards are influenced by a range of factors and can change on a daily basis. This makes conventional classification approaches infeasible (as there are no fixed classes) so we construct a model to overcome this. We convert the problem from classification to regression, using a Siamese network trained with pairwise comparison labels. We extract the model embedded within to predict continuous quality values for the fruit. Our model is able to predict which of two similar quality fruit is better with over 88% accuracy, only 5% below the self-agreement of a human expert.
樱桃产业是新西兰出口商品中一个快速增长的部门,因此,包装厂在加工过程中对樱桃进行分级的准确性变得越来越重要。传统的计算机视觉系统通常用于这一过程,但它们在许多方面都存在不足,仍然需要人类手动验证分级。在这项工作中,我们研究了使用深度学习来改进传统方法。该行业的性质意味着等级标准受到一系列因素的影响,并且每天都可能发生变化。这使得传统的分类方法不可行(因为没有固定的类),所以我们构建了一个模型来克服这个问题。我们将问题从分类转换为回归,使用两两比较标签训练的暹罗网络。我们提取嵌入其中的模型来预测水果的连续质量值。我们的模型能够以超过88%的准确率预测两个相似质量的水果中哪一个更好,仅比人类专家的自我认同低5%。
{"title":"Predicting Cherry Quality Using Siamese Networks","authors":"Yerren van Sint Annaland, Lech Szymanski, S. Mills","doi":"10.1109/IVCNZ51579.2020.9290674","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290674","url":null,"abstract":"The cherry industry is a rapidly growing sector of New Zealand’s export merchandise and, as such, the accuracy with which pack-houses can grade cherries during processing is becoming increasingly critical. Conventional computer vision systems are usually employed in this process, yet they fall short in many respects, still requiring humans to manually verify the grading. In this work, we investigate the use of deep learning to improve upon the traditional approach. The nature of the industry means that the grade standards are influenced by a range of factors and can change on a daily basis. This makes conventional classification approaches infeasible (as there are no fixed classes) so we construct a model to overcome this. We convert the problem from classification to regression, using a Siamese network trained with pairwise comparison labels. We extract the model embedded within to predict continuous quality values for the fruit. Our model is able to predict which of two similar quality fruit is better with over 88% accuracy, only 5% below the self-agreement of a human expert.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114391185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Human Action Recognition Using Deep Learning Methods 使用深度学习方法的人类行为识别
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290594
Zeqi Yu, W. Yan
The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.
人的动作识别的目标是识别和理解视频中人的动作,并输出相应的标签。视频中的动作除了在二维图像中存在空间相关性外,还具有时域属性。由于人类行为的复杂性,例如视角的变化、背景噪声等都会影响识别。为了解决这些棘手的问题,本文设计并实现了三种算法。基于卷积神经网络(CNN),利用Two-Stream CNN、CNN+LSTM和3D CNN来识别视频中的人类行为。对每一种算法进行了详细的阐述和分析。利用HMDB-51数据集对这些算法进行了测试,获得了最佳结果。实验结果表明,三种方法都能有效识别给定视频中的人类动作,从而选出最佳算法。
{"title":"Human Action Recognition Using Deep Learning Methods","authors":"Zeqi Yu, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290594","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290594","url":null,"abstract":"The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134215150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1