Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290653
Finn Petrie, S. Mills
Real-time ray-tracing debuted to consumer GPU hardware in 2018. Primary examples however, have been of hybrid raster and ray-tracing methods that are restricted to triangle mesh geometry. Our research looks at the viability of procedural methods in the real-time setting. We give implementations of analytical and implicit geometry in the domain of the global illumination algorithms bi-directional path-tracing, and GPU Photon-Mapping – both of which we have adapted to the new ray-tracing shader stages, as shown in Figure 1. Despite procedural intersections being more expensive than triangle intersections in Nvidia’s RTX hardware, our results show that these descriptions still run at interactive rates within computationally expensive multi-pass ray-traced global illumination and demonstrate the practical benefits of the geometry.
{"title":"Real Time Ray Tracing of Analytic and Implicit Surfaces","authors":"Finn Petrie, S. Mills","doi":"10.1109/IVCNZ51579.2020.9290653","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290653","url":null,"abstract":"Real-time ray-tracing debuted to consumer GPU hardware in 2018. Primary examples however, have been of hybrid raster and ray-tracing methods that are restricted to triangle mesh geometry. Our research looks at the viability of procedural methods in the real-time setting. We give implementations of analytical and implicit geometry in the domain of the global illumination algorithms bi-directional path-tracing, and GPU Photon-Mapping – both of which we have adapted to the new ray-tracing shader stages, as shown in Figure 1. Despite procedural intersections being more expensive than triangle intersections in Nvidia’s RTX hardware, our results show that these descriptions still run at interactive rates within computationally expensive multi-pass ray-traced global illumination and demonstrate the practical benefits of the geometry.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123748453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290616
Amir Ebrahimi, S. Luo, R. Chiong
This paper focuses on detecting Alzheimer’s Disease (AD) using the ResNet-18 model on Magnetic Resonance Imaging (MRI). Previous studies have applied different 2D Convolutional Neural Networks (CNNs) to detect AD. The main idea being to split 3D MRI scans into 2D image slices, so that classification can be performed on the image slices independently. This idea allows researchers to benefit from the concept of transfer learning. However, 2D CNNs are incapable of understanding the relationship among 2D image slices in a 3D MRI scan. One solution is to employ 3D CNNs instead of 2D ones. In this paper, we propose a method to utilise transfer learning in 3D CNNs, which allows the transfer of knowledge from 2D image datasets to a 3D image dataset. Both 2D and 3D CNNs are compared in this study, and our results show that introducing transfer learning to a 3D CNN improves the accuracy of an AD detection system. After using an optimisation method in the training process, our approach achieved 96.88% accuracy, 100% sensitivity, and 93.75% specificity.
{"title":"Introducing Transfer Leaming to 3D ResNet-18 for Alzheimer’s Disease Detection on MRI Images","authors":"Amir Ebrahimi, S. Luo, R. Chiong","doi":"10.1109/IVCNZ51579.2020.9290616","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290616","url":null,"abstract":"This paper focuses on detecting Alzheimer’s Disease (AD) using the ResNet-18 model on Magnetic Resonance Imaging (MRI). Previous studies have applied different 2D Convolutional Neural Networks (CNNs) to detect AD. The main idea being to split 3D MRI scans into 2D image slices, so that classification can be performed on the image slices independently. This idea allows researchers to benefit from the concept of transfer learning. However, 2D CNNs are incapable of understanding the relationship among 2D image slices in a 3D MRI scan. One solution is to employ 3D CNNs instead of 2D ones. In this paper, we propose a method to utilise transfer learning in 3D CNNs, which allows the transfer of knowledge from 2D image datasets to a 3D image dataset. Both 2D and 3D CNNs are compared in this study, and our results show that introducing transfer learning to a 3D CNN improves the accuracy of an AD detection system. After using an optimisation method in the training process, our approach achieved 96.88% accuracy, 100% sensitivity, and 93.75% specificity.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"558 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116275792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290492
Gonglin Yuan, Bing Xue, Mengjie Zhang
Convolutional neural networks (CNNs) have achieved great success in the image classification field in recent years. Usually, human experts are needed to design the architectures of CNNs for different tasks. Evolutionary neural network architecture search could find optimal CNN architectures automatically. However, the previous representations of CNN architectures with evolutionary algorithms have many restrictions. In this paper, we propose a new flexible representation based on the directed acyclic graph to encode CNN architectures, to develop a genetic algorithm (GA) based evolutionary neural network architecture, where the depth of candidate CNNs could be variable. Furthermore, we design new crossover and mutation operators, which can be performed on individuals of different lengths. The proposed algorithm is evaluated on five widely used datasets. The experimental results show that the proposed algorithm achieves very competitive performance against its peer competitors in terms of the classification accuracy and number of parameters.
{"title":"A Graph-Based Approach to Automatic Convolutional Neural Network Construction for Image Classification","authors":"Gonglin Yuan, Bing Xue, Mengjie Zhang","doi":"10.1109/IVCNZ51579.2020.9290492","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290492","url":null,"abstract":"Convolutional neural networks (CNNs) have achieved great success in the image classification field in recent years. Usually, human experts are needed to design the architectures of CNNs for different tasks. Evolutionary neural network architecture search could find optimal CNN architectures automatically. However, the previous representations of CNN architectures with evolutionary algorithms have many restrictions. In this paper, we propose a new flexible representation based on the directed acyclic graph to encode CNN architectures, to develop a genetic algorithm (GA) based evolutionary neural network architecture, where the depth of candidate CNNs could be variable. Furthermore, we design new crossover and mutation operators, which can be performed on individuals of different lengths. The proposed algorithm is evaluated on five widely used datasets. The experimental results show that the proposed algorithm achieves very competitive performance against its peer competitors in terms of the classification accuracy and number of parameters.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290734
Andrew Chalmers, Taehyun Rhee
High dynamic range (HDR) environment maps (EMs) are spherical textures containing HDR pixels used for illuminating virtual scenes with high realism. Detecting as few necessary pixels as possible within the EM is important for a variety of tasks, such as real-time rendering and EM database management. To address this, we propose a shadow-based algorithm for detecting the most dominant light sources within an EM. This algorithm takes into account the relative impact of all other light sources within the upper-hemisphere of the texture. This is achieved by decomposing an EM into superpixels, sorting the superpixels from brightest to least, and using ℓ0-norm minimisation to keep only the necessary superpixels that maintains the shadow quality of the EM with respect to the just noticeable difference (JND) principle. We show that our method improves upon prior methods in detecting as few lights as possible while still preserving the shadow-casting properties of EMs.
{"title":"Shadow-based Light Detection for HDR Environment Maps","authors":"Andrew Chalmers, Taehyun Rhee","doi":"10.1109/IVCNZ51579.2020.9290734","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290734","url":null,"abstract":"High dynamic range (HDR) environment maps (EMs) are spherical textures containing HDR pixels used for illuminating virtual scenes with high realism. Detecting as few necessary pixels as possible within the EM is important for a variety of tasks, such as real-time rendering and EM database management. To address this, we propose a shadow-based algorithm for detecting the most dominant light sources within an EM. This algorithm takes into account the relative impact of all other light sources within the upper-hemisphere of the texture. This is achieved by decomposing an EM into superpixels, sorting the superpixels from brightest to least, and using ℓ0-norm minimisation to keep only the necessary superpixels that maintains the shadow quality of the EM with respect to the just noticeable difference (JND) principle. We show that our method improves upon prior methods in detecting as few lights as possible while still preserving the shadow-casting properties of EMs.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127445477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290688
P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell
Images of astronomical objects captured by ground-based telescopes are distorted due to atmospheric turbulence. The phase of the atmospheric aberration is traditionally estimated by a wavefront sensor (WFS). This information is utilised by a deformable mirror through a control system to restore the image. However, in this paper, we utilise wavefront sensorless (WFSL) methods in which the wavefront sensor is absent. Given that the largest share of atmospheric turbulence energy is contained in the 2-axial tilt for small aperture telescopes, we use WFSL to specifically remove these two modes. This method is shown to be efficient in terms of both speed and accuracy.
{"title":"A Wavefront Sensorless Tip/Tilt Removal method for Correcting Astronomical Images","authors":"P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290688","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290688","url":null,"abstract":"Images of astronomical objects captured by ground-based telescopes are distorted due to atmospheric turbulence. The phase of the atmospheric aberration is traditionally estimated by a wavefront sensor (WFS). This information is utilised by a deformable mirror through a control system to restore the image. However, in this paper, we utilise wavefront sensorless (WFSL) methods in which the wavefront sensor is absent. Given that the largest share of atmospheric turbulence energy is contained in the 2-axial tilt for small aperture telescopes, we use WFSL to specifically remove these two modes. This method is shown to be efficient in terms of both speed and accuracy.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131036719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290737
Noor-ul-ain Fatima
Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. as some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, I prioritized resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining high-resolution images from their low-resolution counterparts. In my work, image super-resolution techniques like Interpolation, SRCNN (Super-Resolution Convolutional Neural Network), SRResNet (Super Resolution Residual Network), and GANs (Generative Adversarial Networks: Super-Resolution GAN-SRGAN and Conditional GAN-CGAN) were studied experimentally for post-enhancement of images in photography as employed by photo-editors, establishing the most coherent approach for attaining optimized super-resolution in terms of quality.
{"title":"AI in Photography: Scrutinizing Implementation of Super-Resolution Techniques in Photo-Editors","authors":"Noor-ul-ain Fatima","doi":"10.1109/IVCNZ51579.2020.9290737","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290737","url":null,"abstract":"Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. as some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, I prioritized resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining high-resolution images from their low-resolution counterparts. In my work, image super-resolution techniques like Interpolation, SRCNN (Super-Resolution Convolutional Neural Network), SRResNet (Super Resolution Residual Network), and GANs (Generative Adversarial Networks: Super-Resolution GAN-SRGAN and Conditional GAN-CGAN) were studied experimentally for post-enhancement of images in photography as employed by photo-editors, establishing the most coherent approach for attaining optimized super-resolution in terms of quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290656
Juncheng Liu, S. Mills, B. McCane
3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.
{"title":"Variational Autoencoder for 3D Voxel Compression","authors":"Juncheng Liu, S. Mills, B. McCane","doi":"10.1109/IVCNZ51579.2020.9290656","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290656","url":null,"abstract":"3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290542
Yishi Guo, B. Wünsche
Face detection is a fundamental task for many computer vision applications such as access control, security, advertisement, automatic payment, and healthcare. Due to technological advances mobile robots are becoming increasingly common in such applications (e.g. healthcare and security robots) and consequently there is a need for efficient and effective face detection methods on such platforms. Mobile robots have different hardware configurations and operating conditions from desktop applications, e.g. unreliable network connections and the need for lower power consumption. Hence results for face detection methods on desktop platforms cannot be directly translated to mobile platforms.We compare four common face detection algorithms, Viola-Jones, HOG, MTCNN and MobileNet-SSD, for use in mobile robotics using different face data bases. Our results show that for a typical mobile configuration (Nvidia Jetson TX2) Mobile-NetSSD performed best with 90% detection accuracy for the AFW data set and a frame rate of almost 10 fps with GPU acceleration. MTCNN had the highest precision and was superior for more difficult face data sets, but did not achieve real-time performance with the given implementation and hardware configuration.
人脸检测是许多计算机视觉应用的基本任务,如访问控制、安全、广告、自动支付和医疗保健。由于技术的进步,移动机器人在这些应用中变得越来越普遍(例如医疗保健和安全机器人),因此需要在这些平台上高效和有效的面部检测方法。移动机器人的硬件配置和操作条件与桌面应用程序不同,例如不可靠的网络连接和对低功耗的需求。因此,桌面平台上人脸检测方法的结果不能直接转化到移动平台上。我们比较了四种常见的面部检测算法,Viola-Jones, HOG, MTCNN和MobileNet-SSD,用于使用不同面部数据库的移动机器人。我们的研究结果表明,对于典型的移动配置(Nvidia Jetson TX2), mobile - netssd在AFW数据集上表现最佳,检测准确率为90%,在GPU加速下帧率接近10 fps。MTCNN具有最高的精度,并且对于更困难的人脸数据集具有优势,但在给定的实现和硬件配置下无法实现实时性能。
{"title":"Comparison of Face Detection Algorithms on Mobile Devices","authors":"Yishi Guo, B. Wünsche","doi":"10.1109/IVCNZ51579.2020.9290542","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290542","url":null,"abstract":"Face detection is a fundamental task for many computer vision applications such as access control, security, advertisement, automatic payment, and healthcare. Due to technological advances mobile robots are becoming increasingly common in such applications (e.g. healthcare and security robots) and consequently there is a need for efficient and effective face detection methods on such platforms. Mobile robots have different hardware configurations and operating conditions from desktop applications, e.g. unreliable network connections and the need for lower power consumption. Hence results for face detection methods on desktop platforms cannot be directly translated to mobile platforms.We compare four common face detection algorithms, Viola-Jones, HOG, MTCNN and MobileNet-SSD, for use in mobile robotics using different face data bases. Our results show that for a typical mobile configuration (Nvidia Jetson TX2) Mobile-NetSSD performed best with 90% detection accuracy for the AFW data set and a frame rate of almost 10 fps with GPU acceleration. MTCNN had the highest precision and was superior for more difficult face data sets, but did not achieve real-time performance with the given implementation and hardware configuration.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"425 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132234027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290674
Yerren van Sint Annaland, Lech Szymanski, S. Mills
The cherry industry is a rapidly growing sector of New Zealand’s export merchandise and, as such, the accuracy with which pack-houses can grade cherries during processing is becoming increasingly critical. Conventional computer vision systems are usually employed in this process, yet they fall short in many respects, still requiring humans to manually verify the grading. In this work, we investigate the use of deep learning to improve upon the traditional approach. The nature of the industry means that the grade standards are influenced by a range of factors and can change on a daily basis. This makes conventional classification approaches infeasible (as there are no fixed classes) so we construct a model to overcome this. We convert the problem from classification to regression, using a Siamese network trained with pairwise comparison labels. We extract the model embedded within to predict continuous quality values for the fruit. Our model is able to predict which of two similar quality fruit is better with over 88% accuracy, only 5% below the self-agreement of a human expert.
{"title":"Predicting Cherry Quality Using Siamese Networks","authors":"Yerren van Sint Annaland, Lech Szymanski, S. Mills","doi":"10.1109/IVCNZ51579.2020.9290674","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290674","url":null,"abstract":"The cherry industry is a rapidly growing sector of New Zealand’s export merchandise and, as such, the accuracy with which pack-houses can grade cherries during processing is becoming increasingly critical. Conventional computer vision systems are usually employed in this process, yet they fall short in many respects, still requiring humans to manually verify the grading. In this work, we investigate the use of deep learning to improve upon the traditional approach. The nature of the industry means that the grade standards are influenced by a range of factors and can change on a daily basis. This makes conventional classification approaches infeasible (as there are no fixed classes) so we construct a model to overcome this. We convert the problem from classification to regression, using a Siamese network trained with pairwise comparison labels. We extract the model embedded within to predict continuous quality values for the fruit. Our model is able to predict which of two similar quality fruit is better with over 88% accuracy, only 5% below the self-agreement of a human expert.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114391185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290594
Zeqi Yu, W. Yan
The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.
{"title":"Human Action Recognition Using Deep Learning Methods","authors":"Zeqi Yu, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290594","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290594","url":null,"abstract":"The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134215150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}