Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290688
P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell
Images of astronomical objects captured by ground-based telescopes are distorted due to atmospheric turbulence. The phase of the atmospheric aberration is traditionally estimated by a wavefront sensor (WFS). This information is utilised by a deformable mirror through a control system to restore the image. However, in this paper, we utilise wavefront sensorless (WFSL) methods in which the wavefront sensor is absent. Given that the largest share of atmospheric turbulence energy is contained in the 2-axial tilt for small aperture telescopes, we use WFSL to specifically remove these two modes. This method is shown to be efficient in terms of both speed and accuracy.
{"title":"A Wavefront Sensorless Tip/Tilt Removal method for Correcting Astronomical Images","authors":"P. Taghinia, Vishnu Anand Muruganandan, R. Clare, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290688","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290688","url":null,"abstract":"Images of astronomical objects captured by ground-based telescopes are distorted due to atmospheric turbulence. The phase of the atmospheric aberration is traditionally estimated by a wavefront sensor (WFS). This information is utilised by a deformable mirror through a control system to restore the image. However, in this paper, we utilise wavefront sensorless (WFSL) methods in which the wavefront sensor is absent. Given that the largest share of atmospheric turbulence energy is contained in the 2-axial tilt for small aperture telescopes, we use WFSL to specifically remove these two modes. This method is shown to be efficient in terms of both speed and accuracy.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131036719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290492
Gonglin Yuan, Bing Xue, Mengjie Zhang
Convolutional neural networks (CNNs) have achieved great success in the image classification field in recent years. Usually, human experts are needed to design the architectures of CNNs for different tasks. Evolutionary neural network architecture search could find optimal CNN architectures automatically. However, the previous representations of CNN architectures with evolutionary algorithms have many restrictions. In this paper, we propose a new flexible representation based on the directed acyclic graph to encode CNN architectures, to develop a genetic algorithm (GA) based evolutionary neural network architecture, where the depth of candidate CNNs could be variable. Furthermore, we design new crossover and mutation operators, which can be performed on individuals of different lengths. The proposed algorithm is evaluated on five widely used datasets. The experimental results show that the proposed algorithm achieves very competitive performance against its peer competitors in terms of the classification accuracy and number of parameters.
{"title":"A Graph-Based Approach to Automatic Convolutional Neural Network Construction for Image Classification","authors":"Gonglin Yuan, Bing Xue, Mengjie Zhang","doi":"10.1109/IVCNZ51579.2020.9290492","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290492","url":null,"abstract":"Convolutional neural networks (CNNs) have achieved great success in the image classification field in recent years. Usually, human experts are needed to design the architectures of CNNs for different tasks. Evolutionary neural network architecture search could find optimal CNN architectures automatically. However, the previous representations of CNN architectures with evolutionary algorithms have many restrictions. In this paper, we propose a new flexible representation based on the directed acyclic graph to encode CNN architectures, to develop a genetic algorithm (GA) based evolutionary neural network architecture, where the depth of candidate CNNs could be variable. Furthermore, we design new crossover and mutation operators, which can be performed on individuals of different lengths. The proposed algorithm is evaluated on five widely used datasets. The experimental results show that the proposed algorithm achieves very competitive performance against its peer competitors in terms of the classification accuracy and number of parameters.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290734
Andrew Chalmers, Taehyun Rhee
High dynamic range (HDR) environment maps (EMs) are spherical textures containing HDR pixels used for illuminating virtual scenes with high realism. Detecting as few necessary pixels as possible within the EM is important for a variety of tasks, such as real-time rendering and EM database management. To address this, we propose a shadow-based algorithm for detecting the most dominant light sources within an EM. This algorithm takes into account the relative impact of all other light sources within the upper-hemisphere of the texture. This is achieved by decomposing an EM into superpixels, sorting the superpixels from brightest to least, and using ℓ0-norm minimisation to keep only the necessary superpixels that maintains the shadow quality of the EM with respect to the just noticeable difference (JND) principle. We show that our method improves upon prior methods in detecting as few lights as possible while still preserving the shadow-casting properties of EMs.
{"title":"Shadow-based Light Detection for HDR Environment Maps","authors":"Andrew Chalmers, Taehyun Rhee","doi":"10.1109/IVCNZ51579.2020.9290734","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290734","url":null,"abstract":"High dynamic range (HDR) environment maps (EMs) are spherical textures containing HDR pixels used for illuminating virtual scenes with high realism. Detecting as few necessary pixels as possible within the EM is important for a variety of tasks, such as real-time rendering and EM database management. To address this, we propose a shadow-based algorithm for detecting the most dominant light sources within an EM. This algorithm takes into account the relative impact of all other light sources within the upper-hemisphere of the texture. This is achieved by decomposing an EM into superpixels, sorting the superpixels from brightest to least, and using ℓ0-norm minimisation to keep only the necessary superpixels that maintains the shadow quality of the EM with respect to the just noticeable difference (JND) principle. We show that our method improves upon prior methods in detecting as few lights as possible while still preserving the shadow-casting properties of EMs.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127445477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290614
Ashutosh Soni, Partha Bhowmick
A novel algorithm for isotropic remeshing of a triangle mesh is presented in this paper. The algorithm is designed to work on a voxelized surface and integrates several novel ideas. One such is the notion of functional partitioning that aids in uniform distribution of seeds for initializing the process of dynamic Voronoi tessellation (DVT). The concept of DVT is also novel and found to be quite effective for iteratively transforming the input mesh into an isotropic mesh while keeping the tessellation aligned with the surface geometry. In each iteration, a Voronoi energy field is used to rearrange the seeds and to recreate the DVT. Over successive iterations, the DVT is found to keep on improving the mesh isotropy without compromising with the surface features. The Delaunay triangles corresponding to the final tessellation are further subdivided in high-curvature regions. The resultant mesh is finally projected back onto the original mesh in order to minimize the Hausdorff error. As our algorithm works in voxel space, it is readily implementable in GPU. Experimental results on various datasets demonstrate its efficiency and robustness.
{"title":"Isotropic Remeshing by Dynamic Voronoi Tessellation on Voxelized Surface","authors":"Ashutosh Soni, Partha Bhowmick","doi":"10.1109/IVCNZ51579.2020.9290614","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290614","url":null,"abstract":"A novel algorithm for isotropic remeshing of a triangle mesh is presented in this paper. The algorithm is designed to work on a voxelized surface and integrates several novel ideas. One such is the notion of functional partitioning that aids in uniform distribution of seeds for initializing the process of dynamic Voronoi tessellation (DVT). The concept of DVT is also novel and found to be quite effective for iteratively transforming the input mesh into an isotropic mesh while keeping the tessellation aligned with the surface geometry. In each iteration, a Voronoi energy field is used to rearrange the seeds and to recreate the DVT. Over successive iterations, the DVT is found to keep on improving the mesh isotropy without compromising with the surface features. The Delaunay triangles corresponding to the final tessellation are further subdivided in high-curvature regions. The resultant mesh is finally projected back onto the original mesh in order to minimize the Hausdorff error. As our algorithm works in voxel space, it is readily implementable in GPU. Experimental results on various datasets demonstrate its efficiency and robustness.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121795086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290735
R. Clare, B. Engler, S. Weddell
Wavefronts of light from celestial objects are aberrated by Earth’s evolving atmosphere, causing images captured by ground-based telescopes to be distorted. The slope of the phase of the wavefront can be estimated by a pyramid wavefront sensor, which subdivides the complex field at the focal plane of the telescope, producing four images of the aperture. The cone wavefront sensor is the extension of the pyramid sensor to having an infinite number of sides, and produces an annulus of intensity rather than four images. We propose and compare the following methods for reconstructing the wavefront from the intensity measurements from the cone sensor: (1) use the entire aperture image, (2) use the pixels inside the intensity annulus only, (3) create a map of slopes by subtracting the slice of annulus 180 degrees opposite, (4) create x and y slopes by cutting out pseudo-apertures around the annulus, and (5) use the inverse Radon transform of the intensity annulus converted to polar co-ordinates. We find via numerical simulation with atmospheric phase screens that methods (1) and (2) provide the best wavefront estimate, methods (3) and (4) the smallest interaction matrices, while method (5) allows direct reconstruction without an interaction matrix.
{"title":"Wavefront reconstruction with the cone sensor","authors":"R. Clare, B. Engler, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290735","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290735","url":null,"abstract":"Wavefronts of light from celestial objects are aberrated by Earth’s evolving atmosphere, causing images captured by ground-based telescopes to be distorted. The slope of the phase of the wavefront can be estimated by a pyramid wavefront sensor, which subdivides the complex field at the focal plane of the telescope, producing four images of the aperture. The cone wavefront sensor is the extension of the pyramid sensor to having an infinite number of sides, and produces an annulus of intensity rather than four images. We propose and compare the following methods for reconstructing the wavefront from the intensity measurements from the cone sensor: (1) use the entire aperture image, (2) use the pixels inside the intensity annulus only, (3) create a map of slopes by subtracting the slice of annulus 180 degrees opposite, (4) create x and y slopes by cutting out pseudo-apertures around the annulus, and (5) use the inverse Radon transform of the intensity annulus converted to polar co-ordinates. We find via numerical simulation with atmospheric phase screens that methods (1) and (2) provide the best wavefront estimate, methods (3) and (4) the smallest interaction matrices, while method (5) allows direct reconstruction without an interaction matrix.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126263357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290737
Noor-ul-ain Fatima
Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. as some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, I prioritized resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining high-resolution images from their low-resolution counterparts. In my work, image super-resolution techniques like Interpolation, SRCNN (Super-Resolution Convolutional Neural Network), SRResNet (Super Resolution Residual Network), and GANs (Generative Adversarial Networks: Super-Resolution GAN-SRGAN and Conditional GAN-CGAN) were studied experimentally for post-enhancement of images in photography as employed by photo-editors, establishing the most coherent approach for attaining optimized super-resolution in terms of quality.
{"title":"AI in Photography: Scrutinizing Implementation of Super-Resolution Techniques in Photo-Editors","authors":"Noor-ul-ain Fatima","doi":"10.1109/IVCNZ51579.2020.9290737","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290737","url":null,"abstract":"Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. as some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, I prioritized resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining high-resolution images from their low-resolution counterparts. In my work, image super-resolution techniques like Interpolation, SRCNN (Super-Resolution Convolutional Neural Network), SRResNet (Super Resolution Residual Network), and GANs (Generative Adversarial Networks: Super-Resolution GAN-SRGAN and Conditional GAN-CGAN) were studied experimentally for post-enhancement of images in photography as employed by photo-editors, establishing the most coherent approach for attaining optimized super-resolution in terms of quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290656
Juncheng Liu, S. Mills, B. McCane
3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.
{"title":"Variational Autoencoder for 3D Voxel Compression","authors":"Juncheng Liu, S. Mills, B. McCane","doi":"10.1109/IVCNZ51579.2020.9290656","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290656","url":null,"abstract":"3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290542
Yishi Guo, B. Wünsche
Face detection is a fundamental task for many computer vision applications such as access control, security, advertisement, automatic payment, and healthcare. Due to technological advances mobile robots are becoming increasingly common in such applications (e.g. healthcare and security robots) and consequently there is a need for efficient and effective face detection methods on such platforms. Mobile robots have different hardware configurations and operating conditions from desktop applications, e.g. unreliable network connections and the need for lower power consumption. Hence results for face detection methods on desktop platforms cannot be directly translated to mobile platforms.We compare four common face detection algorithms, Viola-Jones, HOG, MTCNN and MobileNet-SSD, for use in mobile robotics using different face data bases. Our results show that for a typical mobile configuration (Nvidia Jetson TX2) Mobile-NetSSD performed best with 90% detection accuracy for the AFW data set and a frame rate of almost 10 fps with GPU acceleration. MTCNN had the highest precision and was superior for more difficult face data sets, but did not achieve real-time performance with the given implementation and hardware configuration.
人脸检测是许多计算机视觉应用的基本任务,如访问控制、安全、广告、自动支付和医疗保健。由于技术的进步,移动机器人在这些应用中变得越来越普遍(例如医疗保健和安全机器人),因此需要在这些平台上高效和有效的面部检测方法。移动机器人的硬件配置和操作条件与桌面应用程序不同,例如不可靠的网络连接和对低功耗的需求。因此,桌面平台上人脸检测方法的结果不能直接转化到移动平台上。我们比较了四种常见的面部检测算法,Viola-Jones, HOG, MTCNN和MobileNet-SSD,用于使用不同面部数据库的移动机器人。我们的研究结果表明,对于典型的移动配置(Nvidia Jetson TX2), mobile - netssd在AFW数据集上表现最佳,检测准确率为90%,在GPU加速下帧率接近10 fps。MTCNN具有最高的精度,并且对于更困难的人脸数据集具有优势,但在给定的实现和硬件配置下无法实现实时性能。
{"title":"Comparison of Face Detection Algorithms on Mobile Devices","authors":"Yishi Guo, B. Wünsche","doi":"10.1109/IVCNZ51579.2020.9290542","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290542","url":null,"abstract":"Face detection is a fundamental task for many computer vision applications such as access control, security, advertisement, automatic payment, and healthcare. Due to technological advances mobile robots are becoming increasingly common in such applications (e.g. healthcare and security robots) and consequently there is a need for efficient and effective face detection methods on such platforms. Mobile robots have different hardware configurations and operating conditions from desktop applications, e.g. unreliable network connections and the need for lower power consumption. Hence results for face detection methods on desktop platforms cannot be directly translated to mobile platforms.We compare four common face detection algorithms, Viola-Jones, HOG, MTCNN and MobileNet-SSD, for use in mobile robotics using different face data bases. Our results show that for a typical mobile configuration (Nvidia Jetson TX2) Mobile-NetSSD performed best with 90% detection accuracy for the AFW data set and a frame rate of almost 10 fps with GPU acceleration. MTCNN had the highest precision and was superior for more difficult face data sets, but did not achieve real-time performance with the given implementation and hardware configuration.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"425 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132234027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290674
Yerren van Sint Annaland, Lech Szymanski, S. Mills
The cherry industry is a rapidly growing sector of New Zealand’s export merchandise and, as such, the accuracy with which pack-houses can grade cherries during processing is becoming increasingly critical. Conventional computer vision systems are usually employed in this process, yet they fall short in many respects, still requiring humans to manually verify the grading. In this work, we investigate the use of deep learning to improve upon the traditional approach. The nature of the industry means that the grade standards are influenced by a range of factors and can change on a daily basis. This makes conventional classification approaches infeasible (as there are no fixed classes) so we construct a model to overcome this. We convert the problem from classification to regression, using a Siamese network trained with pairwise comparison labels. We extract the model embedded within to predict continuous quality values for the fruit. Our model is able to predict which of two similar quality fruit is better with over 88% accuracy, only 5% below the self-agreement of a human expert.
{"title":"Predicting Cherry Quality Using Siamese Networks","authors":"Yerren van Sint Annaland, Lech Szymanski, S. Mills","doi":"10.1109/IVCNZ51579.2020.9290674","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290674","url":null,"abstract":"The cherry industry is a rapidly growing sector of New Zealand’s export merchandise and, as such, the accuracy with which pack-houses can grade cherries during processing is becoming increasingly critical. Conventional computer vision systems are usually employed in this process, yet they fall short in many respects, still requiring humans to manually verify the grading. In this work, we investigate the use of deep learning to improve upon the traditional approach. The nature of the industry means that the grade standards are influenced by a range of factors and can change on a daily basis. This makes conventional classification approaches infeasible (as there are no fixed classes) so we construct a model to overcome this. We convert the problem from classification to regression, using a Siamese network trained with pairwise comparison labels. We extract the model embedded within to predict continuous quality values for the fruit. Our model is able to predict which of two similar quality fruit is better with over 88% accuracy, only 5% below the self-agreement of a human expert.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114391185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-25DOI: 10.1109/IVCNZ51579.2020.9290594
Zeqi Yu, W. Yan
The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.
{"title":"Human Action Recognition Using Deep Learning Methods","authors":"Zeqi Yu, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290594","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290594","url":null,"abstract":"The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134215150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}