首页 > 最新文献

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)最新文献

英文 中文
Deep Learning Methods for Human Behavior Recognition 人类行为识别的深度学习方法
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290640
Jia Lu, M. Nguyen, W. Yan
In this paper, we investigate the problem of human behavior recognition by using the state-of-the-art deep learning methods. In order to achieve sufficient recognition accuracy, both spatial and temporal information was acquired to implement the recognition in this project. We propose a novel YOLOv4 + LSTM network, which yields promising results for real-time recognition. For the purpose of comparisons, we implement Selective Kernel Network (SKNet) with attention mechanism. The key contributions of this paper are: (1) YOLOv4 + LSTM network is implemented to achieve 97.87% accuracy based on our own dataset by using spatiotemporal information from pre-recorded video footages. (2) The SKNet with attention model that earns the best accuracy of human behaviour recognition at the rate up to 98.7% based on multiple public datasets.
在本文中,我们通过使用最先进的深度学习方法来研究人类行为识别问题。为了达到足够的识别精度,本项目需要同时获取空间和时间信息来实现识别。我们提出了一种新的YOLOv4 + LSTM网络,它在实时识别方面取得了很好的效果。为了便于比较,我们实现了带有注意机制的选择性内核网络(SKNet)。本文的主要贡献有:(1)基于我们自己的数据集,利用预先录制的视频片段的时空信息,实现了YOLOv4 + LSTM网络,准确率达到97.87%。(2)基于多个公开数据集的SKNet with attention模型对人类行为的识别准确率最高,达到98.7%。
{"title":"Deep Learning Methods for Human Behavior Recognition","authors":"Jia Lu, M. Nguyen, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290640","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290640","url":null,"abstract":"In this paper, we investigate the problem of human behavior recognition by using the state-of-the-art deep learning methods. In order to achieve sufficient recognition accuracy, both spatial and temporal information was acquired to implement the recognition in this project. We propose a novel YOLOv4 + LSTM network, which yields promising results for real-time recognition. For the purpose of comparisons, we implement Selective Kernel Network (SKNet) with attention mechanism. The key contributions of this paper are: (1) YOLOv4 + LSTM network is implemented to achieve 97.87% accuracy based on our own dataset by using spatiotemporal information from pre-recorded video footages. (2) The SKNet with attention model that earns the best accuracy of human behaviour recognition at the rate up to 98.7% based on multiple public datasets.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133638142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Evaluating Learned State Representations for Atari 评估Atari的学习状态表示
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290609
Adam Tupper, K. Neshatian
Deep reinforcement learning, the combination of deep learning and reinforcement learning, has enabled the training of agents that can solve complex tasks from visual inputs. However, these methods often require prohibitive amounts of computation to obtain successful results. To improve learning efficiency, there has been a renewed focus on separating state representation and policy learning. In this paper, we investigate the quality of state representations learned by different types of autoencoders, a popular class of neural networks used for representation learning. We assess not only the quality of the representations learned by undercomplete, variational, and disentangled variational autoencoders, but also how the quality of the learned representations is affected by changes in representation size. To accomplish this, we also present a new method for evaluating learned state representations for Atari games using the Atari Annotated RAM Interface. Our findings highlight differences in the quality of state representations learned by different types of autoencoders and their robustness to reduction in representation size. Our results also demonstrate the advantage of using more sophisticated evaluation methods over assessing reconstruction quality.
深度强化学习是深度学习和强化学习的结合,能够训练出能够从视觉输入中解决复杂任务的智能体。然而,这些方法通常需要大量的计算才能获得成功的结果。为了提高学习效率,人们重新关注将状态表示和策略学习分开。在本文中,我们研究了由不同类型的自编码器学习的状态表示的质量,自编码器是一种常用的用于表示学习的神经网络。我们不仅评估了欠完全、变分和解纠缠变分自编码器学习到的表征的质量,而且还评估了学习到的表征的质量如何受到表征大小变化的影响。为了实现这一点,我们还提出了一种使用Atari注释RAM接口来评估Atari游戏的学习状态表示的新方法。我们的研究结果强调了不同类型的自编码器在学习状态表示的质量上的差异,以及它们对减少表示大小的鲁棒性。我们的结果也证明了使用更复杂的评估方法比评估重建质量的优势。
{"title":"Evaluating Learned State Representations for Atari","authors":"Adam Tupper, K. Neshatian","doi":"10.1109/IVCNZ51579.2020.9290609","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290609","url":null,"abstract":"Deep reinforcement learning, the combination of deep learning and reinforcement learning, has enabled the training of agents that can solve complex tasks from visual inputs. However, these methods often require prohibitive amounts of computation to obtain successful results. To improve learning efficiency, there has been a renewed focus on separating state representation and policy learning. In this paper, we investigate the quality of state representations learned by different types of autoencoders, a popular class of neural networks used for representation learning. We assess not only the quality of the representations learned by undercomplete, variational, and disentangled variational autoencoders, but also how the quality of the learned representations is affected by changes in representation size. To accomplish this, we also present a new method for evaluating learned state representations for Atari games using the Atari Annotated RAM Interface. Our findings highlight differences in the quality of state representations learned by different types of autoencoders and their robustness to reduction in representation size. Our results also demonstrate the advantage of using more sophisticated evaluation methods over assessing reconstruction quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129522291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Vehicle-Related Scene Segmentation Using CapsNets 使用capnet的车辆相关场景分割
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290664
Xiaoxu Liu, W. Yan, N. Kasabov
Understanding of traffic scenes is a significant research problem in computer vision. In this paper, we present and implement a robust scene segmentation model by using capsule network (CapsNet) as a basic framework. We collected a large number of image samples related to Auckland traffic scenes of the motorway and labelled the data for multiple classifications. The contribution of this paper is that our model facilitates a better scene understanding based on matrix representation of pose and spatial relationship. We take a step forward to effectively solve the Picasso problem. The methods are based on deep learning and reduce human manipulation of data by completing the training process using only a small size of training data. Our model has the preliminary accuracy up to 74.61% based on our own dataset.
交通场景的理解是计算机视觉领域的一个重要研究课题。本文以胶囊网络(CapsNet)为基本框架,提出并实现了一种鲁棒的场景分割模型。我们收集了大量与奥克兰高速公路交通场景相关的图像样本,并对数据进行了标记,以便进行多重分类。本文的贡献在于我们的模型基于姿态和空间关系的矩阵表示促进了更好的场景理解。我们向有效解决毕加索问题又迈进了一步。这些方法基于深度学习,通过只使用少量的训练数据完成训练过程,减少了人类对数据的操纵。基于我们自己的数据集,我们的模型的初步精度高达74.61%。
{"title":"Vehicle-Related Scene Segmentation Using CapsNets","authors":"Xiaoxu Liu, W. Yan, N. Kasabov","doi":"10.1109/IVCNZ51579.2020.9290664","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290664","url":null,"abstract":"Understanding of traffic scenes is a significant research problem in computer vision. In this paper, we present and implement a robust scene segmentation model by using capsule network (CapsNet) as a basic framework. We collected a large number of image samples related to Auckland traffic scenes of the motorway and labelled the data for multiple classifications. The contribution of this paper is that our model facilitates a better scene understanding based on matrix representation of pose and spatial relationship. We take a step forward to effectively solve the Picasso problem. The methods are based on deep learning and reduce human manipulation of data by completing the training process using only a small size of training data. Our model has the preliminary accuracy up to 74.61% based on our own dataset.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130402793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Defects Detection in Highly Specular Surface using a Combination of Stereo and Laser Reconstruction 结合立体和激光重建的高高光表面缺陷检测
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290660
Arpita Dawda, M. Nguyen
Product inspection is an indispensable tool of the current manufacturing process. It helps maintain the quality of the product and reduces manufacturing costs by eliminating scrap losses [1]. In the modern era, the inspection process also needs to be automatic, fast and accurate [2]. “Machine vision is the technology and methods used to provide imaging-based automatic inspection and analysis [3].” However, highly specular (mirror-like) surfaces are still proven to be the limitation of many state-of-art three-dimensional (3D) reconstruction approaches. The specularity of the outer surface makes it difficult to 3D reconstruct the product model accurately. Along with accurate measurements, it is also essential to detect defects such as dents, bumps, cracks and scratches present in a product. As these defects are palpable and are not visible by the camera, it is tough to detect them using vision-based inspection techniques in ambient lighting conditions. This paper presents an automated defect detection technique using the concepts of laser line projection and stereo vision. This research activity came up as an evolution of a previous study in which, the ideas of stereo-vision reconstruction and laser line projection were used, for accurate 3D measurement of highly specular surfaces. In this paper, the detection of three defect types (Dents, Scratches and Bumps) are examined in ambient lighting conditions. In the end, the output 3D profile of the defected product is compared with the non-defective product for accuracy evaluation.
产品检验是当前生产过程中不可缺少的工具。它有助于保持产品的质量,并通过消除废料损失来降低制造成本[1]。在现代,检测过程也需要自动化、快速和准确[2]。机器视觉是用于提供基于图像的自动检测和分析的技术和方法[3]。然而,高镜面(镜面状)表面仍然被证明是许多最先进的三维(3D)重建方法的局限性。外表面的高光性给产品模型的精确三维重建带来了困难。除了精确的测量外,检测产品中存在的凹痕、凸起、裂缝和划痕等缺陷也很重要。由于这些缺陷是可触摸的,相机是不可见的,因此在环境照明条件下使用基于视觉的检测技术很难检测到它们。本文提出了一种基于激光线投影和立体视觉的自动缺陷检测技术。这项研究活动是先前研究的演变,其中使用了立体视觉重建和激光线投影的思想,用于高镜面的精确3D测量。本文研究了在环境光照条件下三种缺陷类型(凹痕、划痕和凸起)的检测。最后,将缺陷产品的输出三维轮廓与非缺陷产品进行比较,以进行精度评估。
{"title":"Defects Detection in Highly Specular Surface using a Combination of Stereo and Laser Reconstruction","authors":"Arpita Dawda, M. Nguyen","doi":"10.1109/IVCNZ51579.2020.9290660","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290660","url":null,"abstract":"Product inspection is an indispensable tool of the current manufacturing process. It helps maintain the quality of the product and reduces manufacturing costs by eliminating scrap losses [1]. In the modern era, the inspection process also needs to be automatic, fast and accurate [2]. “Machine vision is the technology and methods used to provide imaging-based automatic inspection and analysis [3].” However, highly specular (mirror-like) surfaces are still proven to be the limitation of many state-of-art three-dimensional (3D) reconstruction approaches. The specularity of the outer surface makes it difficult to 3D reconstruct the product model accurately. Along with accurate measurements, it is also essential to detect defects such as dents, bumps, cracks and scratches present in a product. As these defects are palpable and are not visible by the camera, it is tough to detect them using vision-based inspection techniques in ambient lighting conditions. This paper presents an automated defect detection technique using the concepts of laser line projection and stereo vision. This research activity came up as an evolution of a previous study in which, the ideas of stereo-vision reconstruction and laser line projection were used, for accurate 3D measurement of highly specular surfaces. In this paper, the detection of three defect types (Dents, Scratches and Bumps) are examined in ambient lighting conditions. In the end, the output 3D profile of the defected product is compared with the non-defective product for accuracy evaluation.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128370790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Class Probability-based Visual and Contextual Feature Integration for Image Parsing 基于类概率的图像分析视觉与上下文特征集成
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290686
Basim Azam, Ranju Mandal, Ligang Zhang, B. Verma
Deep learning networks have become one of the most promising architectures for image parsing tasks. Although existing deep networks consider global and local contextual information of the images to learn coarse features individually, they lack automatic adaptation to the contextual properties of scenes. In this work, we present a visual and contextual feature-based deep network for image parsing. The main novelty is in the 3-layer architecture which considers contextual information and each layer is independently trained and integrated. The network explores the contextual features along with the visual features for class label prediction with class-specific classifiers. The contextual features consider the prior information learned by calculating the co-occurrence of object labels both within a whole scene and between neighboring superpixels. The class-specific classifier deals with an imbalance of data for various object categories and learns the coarse features for every category individually. A series of weak classifiers in combination with boosting algorithms are investigated as classifiers along with the aggregated contextual features. The experiments were conducted on the benchmark Stanford background dataset which showed that the proposed architecture produced the highest average accuracy and comparable global accuracy.
深度学习网络已经成为图像分析任务中最有前途的架构之一。虽然现有的深度网络考虑图像的全局和局部上下文信息来单独学习粗特征,但它们缺乏对场景上下文属性的自动适应。在这项工作中,我们提出了一个基于视觉和上下文特征的图像解析深度网络。主要的新颖之处在于考虑上下文信息的三层架构,每层都是独立训练和集成的。该网络探索上下文特征和视觉特征,用于使用特定类别的分类器进行类别标签预测。上下文特征考虑了通过计算整个场景内和相邻超像素之间物体标签的共现性而获得的先验信息。特定类分类器处理不同对象类别的数据不平衡,并单独学习每个类别的粗特征。研究了一系列结合增强算法的弱分类器与聚合的上下文特征作为分类器。在基准斯坦福背景数据集上进行的实验表明,所提出的架构产生了最高的平均精度和可比的全局精度。
{"title":"Class Probability-based Visual and Contextual Feature Integration for Image Parsing","authors":"Basim Azam, Ranju Mandal, Ligang Zhang, B. Verma","doi":"10.1109/IVCNZ51579.2020.9290686","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290686","url":null,"abstract":"Deep learning networks have become one of the most promising architectures for image parsing tasks. Although existing deep networks consider global and local contextual information of the images to learn coarse features individually, they lack automatic adaptation to the contextual properties of scenes. In this work, we present a visual and contextual feature-based deep network for image parsing. The main novelty is in the 3-layer architecture which considers contextual information and each layer is independently trained and integrated. The network explores the contextual features along with the visual features for class label prediction with class-specific classifiers. The contextual features consider the prior information learned by calculating the co-occurrence of object labels both within a whole scene and between neighboring superpixels. The class-specific classifier deals with an imbalance of data for various object categories and learns the coarse features for every category individually. A series of weak classifiers in combination with boosting algorithms are investigated as classifiers along with the aggregated contextual features. The experiments were conducted on the benchmark Stanford background dataset which showed that the proposed architecture produced the highest average accuracy and comparable global accuracy.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129632402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast Portrait Segmentation of the Head and Upper Body 头部和上身的快速肖像分割
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290654
S. Loke, B. MacDonald, Matthew Parsons, B. Wünsche
Portrait segmentation is the process whereby the head and upper body of a person is separated from the background of an image or video stream. This is difficult to achieve accurately, although good results have been obtained with deep learning methods which cope well with occlusion, pose and illumination changes. These are however, either slow or require a powerful system to operate in real-time. We present a new method of portrait segmentation called FaceSeg which uses fast DBSCAN clustering combined with smart face tracking that can replicate the benefits and accuracy of deep learning methods at a much faster speed. In a direct comparison using a standard testing suite, our method achieved a segmentation speed of 150 fps for a 640x480 video stream with median accuracy and F1 scores of 99.96% and 99.93% respectively on simple backgrounds, with 98.81% and 98.13% on complex backgrounds. The state-of-art deep learning based FastPortrait / Mobile Neural Network method achieved 15 fps with 99.95% accuracy and 99.91% F1 score on simple backgrounds, and 99.01% accuracy and 98.43 F1 score on complex backgrounds. An efficacy-boosted implementation for FaceSeg can achieve 75 fps with 99.23% accuracy and 98.79% F1 score on complex backgrounds.
人像分割是一个过程,其中一个人的头部和上身从图像或视频流的背景分离。这很难准确地实现,尽管深度学习方法可以很好地处理遮挡、姿势和照明变化,但已经获得了很好的结果。然而,这些要么很慢,要么需要一个强大的系统来实时操作。我们提出了一种新的肖像分割方法,称为FaceSeg,它使用快速DBSCAN聚类结合智能人脸跟踪,可以以更快的速度复制深度学习方法的优点和准确性。在与标准测试套件的直接比较中,我们的方法对640x480视频流实现了150 fps的分割速度,在简单背景下的中位数准确率和F1分数分别为99.96%和99.93%,在复杂背景下为98.81%和98.13%。基于最先进深度学习的FastPortrait / Mobile Neural Network方法在简单背景下实现了15 fps,准确率为99.95%,F1分数为99.91%;在复杂背景下实现了99.01%,F1分数为98.43。在复杂背景下,FaceSeg的效率提升实现可以达到75 fps,准确率为99.23%,F1分数为98.79%。
{"title":"Fast Portrait Segmentation of the Head and Upper Body","authors":"S. Loke, B. MacDonald, Matthew Parsons, B. Wünsche","doi":"10.1109/IVCNZ51579.2020.9290654","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290654","url":null,"abstract":"Portrait segmentation is the process whereby the head and upper body of a person is separated from the background of an image or video stream. This is difficult to achieve accurately, although good results have been obtained with deep learning methods which cope well with occlusion, pose and illumination changes. These are however, either slow or require a powerful system to operate in real-time. We present a new method of portrait segmentation called FaceSeg which uses fast DBSCAN clustering combined with smart face tracking that can replicate the benefits and accuracy of deep learning methods at a much faster speed. In a direct comparison using a standard testing suite, our method achieved a segmentation speed of 150 fps for a 640x480 video stream with median accuracy and F1 scores of 99.96% and 99.93% respectively on simple backgrounds, with 98.81% and 98.13% on complex backgrounds. The state-of-art deep learning based FastPortrait / Mobile Neural Network method achieved 15 fps with 99.95% accuracy and 99.91% F1 score on simple backgrounds, and 99.01% accuracy and 98.43 F1 score on complex backgrounds. An efficacy-boosted implementation for FaceSeg can achieve 75 fps with 99.23% accuracy and 98.79% F1 score on complex backgrounds.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"478 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134140386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Machine Learning with Synthetic Data – a New Way to Learn and Classify the Pictorial Augmented Reality Markers in Real-Time 基于合成数据的机器学习——一种实时学习和分类图像增强现实标记的新方法
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290606
H. Le, M. Nguyen, W. Yan
The idea of Augmented Reality (AR) appeared in the early 60s, which recently received a large amount of public attention. AR allows us to work, learn, play, and connect with the world around us both virtually and physically in real-time. However, picking the AR marker to match the users’ needs is one of the most challenging tasks due to different marker encryption/decryption methods and essential requirements. Barcode AR cards are fast and efficient, but they do not contain much visual information; pictorial coloured AR card, on the other hand, is slow and not reliable. This paper proposes a solution to obtain detectable arbitrary pictorial/colour AR cards in real-time by applying the benefit of machine learning and the power of synthetic data generation techniques. This technique solves the issue of labour-intensive tasks of manual annotations when building a massive training dataset of deep-learning. Thus, with a small number of input of the AR-enhanced target figures (as few as one for each coloured card), the synthetic data generated process will produce a deep-learning trainable dataset using computer-graphic rendering techniques (ten of thousands from just one image). Second, the generated dataset is then trained with a chosen object recognition convolutional neural network, acting as the AR marker tracking functionality. Our proposed idea works effectively well without modifying the original contents (of the chosen AR card). The benefits of using synthetic data generated techniques help us to improve the AR marker recognition accuracy and reduce the marker registration time. The trained model is capable of processing video sequences at approximately 25 frames per second without GPU Acceleration, which is suitable for AR experience on the mobile/web platform. We believed that it could be a promising low-cost AR approach in many areas, such as education and gaming.
增强现实(AR)的概念出现于上世纪60年代初,最近受到了公众的广泛关注。增强现实使我们能够工作、学习、娱乐,并与我们周围的世界进行虚拟和实时的联系。然而,由于不同的标记加密/解密方法和基本要求,选择符合用户需求的AR标记是最具挑战性的任务之一。条码AR卡快速高效,但没有包含太多的视觉信息;另一方面,图案彩色AR卡速度慢且不可靠。本文提出了一种解决方案,通过应用机器学习的优势和合成数据生成技术的力量,实时获得可检测的任意图像/彩色AR卡。该技术解决了在构建大规模深度学习训练数据集时手工标注的劳动密集型任务问题。因此,通过少量的ar增强目标图形输入(每张彩色卡片少至一个),合成数据生成过程将使用计算机图形渲染技术产生一个深度学习可训练的数据集(仅一张图像就有数万个)。其次,使用选定的对象识别卷积神经网络训练生成的数据集,作为AR标记跟踪功能。我们提出的想法在不修改(所选AR卡的)原始内容的情况下有效地工作。使用合成数据生成技术的好处有助于我们提高AR标记识别的准确性,减少标记注册时间。经过训练的模型能够在没有GPU加速的情况下以大约每秒25帧的速度处理视频序列,这适用于移动/web平台上的AR体验。我们相信,在教育和游戏等许多领域,这可能是一种很有前途的低成本增强现实方法。
{"title":"Machine Learning with Synthetic Data – a New Way to Learn and Classify the Pictorial Augmented Reality Markers in Real-Time","authors":"H. Le, M. Nguyen, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290606","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290606","url":null,"abstract":"The idea of Augmented Reality (AR) appeared in the early 60s, which recently received a large amount of public attention. AR allows us to work, learn, play, and connect with the world around us both virtually and physically in real-time. However, picking the AR marker to match the users’ needs is one of the most challenging tasks due to different marker encryption/decryption methods and essential requirements. Barcode AR cards are fast and efficient, but they do not contain much visual information; pictorial coloured AR card, on the other hand, is slow and not reliable. This paper proposes a solution to obtain detectable arbitrary pictorial/colour AR cards in real-time by applying the benefit of machine learning and the power of synthetic data generation techniques. This technique solves the issue of labour-intensive tasks of manual annotations when building a massive training dataset of deep-learning. Thus, with a small number of input of the AR-enhanced target figures (as few as one for each coloured card), the synthetic data generated process will produce a deep-learning trainable dataset using computer-graphic rendering techniques (ten of thousands from just one image). Second, the generated dataset is then trained with a chosen object recognition convolutional neural network, acting as the AR marker tracking functionality. Our proposed idea works effectively well without modifying the original contents (of the chosen AR card). The benefits of using synthetic data generated techniques help us to improve the AR marker recognition accuracy and reduce the marker registration time. The trained model is capable of processing video sequences at approximately 25 frames per second without GPU Acceleration, which is suitable for AR experience on the mobile/web platform. We believed that it could be a promising low-cost AR approach in many areas, such as education and gaming.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116886257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatically localising ROIs in hyperspectral images using background subtraction techniques 利用背景减法技术自动定位高光谱图像中的roi
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290728
Munir Shah, V. Cave, Marlon dos Reis
The use of snapshot hyperspectral cameras is becoming increasingly popular in agricultural scientific studies. One of the key steps in processing experimental hyperspectral data is to precisely locate the sample material under study and separate it from other background material, such as sampling instruments or equipment. This is very laborious work, especially for hyperspectral imaging scenarios where there might be a few hundred spectral images per sample. In this paper we propose a multiple-background modelling approach for automatically localising the Regions of Interest (ROIs) in hyperspectral images. The two key components of this method are i) modelling each spectral band individually and ii) applying a consensus algorithm to obtain the final ROIs for the whole hyperspectral image. Our proposed approach is able to achieve approximately a 14% improvement in ROIs detection in hyperspectral images compared to traditional video background modelling techniques.
快照式高光谱相机在农业科学研究中的应用越来越广泛。处理实验高光谱数据的关键步骤之一是精确定位所研究的样品材料,并将其与其他背景材料(如采样仪器或设备)分离开来。这是一项非常费力的工作,特别是对于每个样本可能有几百张光谱图像的高光谱成像场景。本文提出了一种多背景建模方法来自动定位高光谱图像中的感兴趣区域(roi)。该方法的两个关键组成部分是:1)对每个光谱波段单独建模;2)应用一致性算法获得整个高光谱图像的最终roi。与传统的视频背景建模技术相比,我们提出的方法能够在高光谱图像中实现大约14%的roi检测改进。
{"title":"Automatically localising ROIs in hyperspectral images using background subtraction techniques","authors":"Munir Shah, V. Cave, Marlon dos Reis","doi":"10.1109/IVCNZ51579.2020.9290728","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290728","url":null,"abstract":"The use of snapshot hyperspectral cameras is becoming increasingly popular in agricultural scientific studies. One of the key steps in processing experimental hyperspectral data is to precisely locate the sample material under study and separate it from other background material, such as sampling instruments or equipment. This is very laborious work, especially for hyperspectral imaging scenarios where there might be a few hundred spectral images per sample. In this paper we propose a multiple-background modelling approach for automatically localising the Regions of Interest (ROIs) in hyperspectral images. The two key components of this method are i) modelling each spectral band individually and ii) applying a consensus algorithm to obtain the final ROIs for the whole hyperspectral image. Our proposed approach is able to achieve approximately a 14% improvement in ROIs detection in hyperspectral images compared to traditional video background modelling techniques.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"117 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Tip-Tilt Mirror Control System for Partial Image Correction at UC Mount John Observatory 加州大学约翰山天文台部分图像校正的倾斜镜控制系统
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290543
Jiayu Liu, Vishnu Anand Muruganandan, R. Clare, María Celeste Ramírez Trujillo, S. Weddell
Astronomical images captured by ground-based telescopes, including at University of Canterbury Mount John Observatory, are distorted due to atmospheric turbulence. The major constituents of atmospheric distortion are tip-tilt aberrations. The solution to achieve higher resolution is to develop and install a tip-tilt mirror control system on ground-based telescopes. A real-time tip-tilt mirror control system measures and corrects for tip-tilt aberrations in optical wavefronts. It effectively minimises the perturbation of the star image when observing with the aid of a telescope. To the best of our knowledge, this is the first tip-tilt mirror control system to be applied at a New Zealand astronomical observatory. This would extend the possibilities of correcting higher-order aberrations for 0.5 to 1.0 metre class, ground-based telescopes.
包括坎特伯雷大学约翰山天文台在内的地面望远镜拍摄的天文图像由于大气湍流而失真。大气畸变的主要成分是倾斜像差。实现更高分辨率的解决方案是在地面望远镜上开发和安装一个倾斜反射镜控制系统。一种实时倾斜反射镜控制系统可以测量和校正光波前的倾斜像差。它有效地减少了借助望远镜观测时对恒星图像的扰动。据我们所知,这是第一个在新西兰天文台应用的倾斜镜面控制系统。这将扩大校正0.5至1.0米级地面望远镜高阶像差的可能性。
{"title":"A Tip-Tilt Mirror Control System for Partial Image Correction at UC Mount John Observatory","authors":"Jiayu Liu, Vishnu Anand Muruganandan, R. Clare, María Celeste Ramírez Trujillo, S. Weddell","doi":"10.1109/IVCNZ51579.2020.9290543","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290543","url":null,"abstract":"Astronomical images captured by ground-based telescopes, including at University of Canterbury Mount John Observatory, are distorted due to atmospheric turbulence. The major constituents of atmospheric distortion are tip-tilt aberrations. The solution to achieve higher resolution is to develop and install a tip-tilt mirror control system on ground-based telescopes. A real-time tip-tilt mirror control system measures and corrects for tip-tilt aberrations in optical wavefronts. It effectively minimises the perturbation of the star image when observing with the aid of a telescope. To the best of our knowledge, this is the first tip-tilt mirror control system to be applied at a New Zealand astronomical observatory. This would extend the possibilities of correcting higher-order aberrations for 0.5 to 1.0 metre class, ground-based telescopes.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122374628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Plant Trait Segmentation for Plant Growth Monitoring 植物性状分割用于植物生长监测
Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290575
Abhipray Paturkar, G. S. Gupta, D. Bailey
3D point cloud segmentation is an important step for plant phenotyping applications. The segmentation should be able to separate the various plant components such as leaves and stem robustly to enable traits to be measured. Also, it is important for the segmentation method to work on range of plant architectures with good accuracy and computation time. In this paper, we propose a segmentation method using Euclidean distance to segment the point cloud generated using a structure-from-motion algorithm. The proposed algorithm requires no prior information about the point cloud. Experimental results illustrate that our proposed method can effectively segment the plant point cloud irrespective of its architecture and growth stage. The proposed method has outperformed the standard methods in terms of computation time and segmentation quality.
三维点云分割是植物表型分析应用的重要步骤。该分割方法应能够对植物的叶、茎等不同成分进行鲁棒性分离,以实现性状的测量。此外,分割方法在一定范围内具有良好的精度和计算时间是很重要的。本文提出了一种基于欧几里德距离的点云分割方法。该算法不需要点云的先验信息。实验结果表明,无论植物点云的结构和生长阶段如何,该方法都能有效地分割植物点云。该方法在计算时间和分割质量方面都优于标准方法。
{"title":"Plant Trait Segmentation for Plant Growth Monitoring","authors":"Abhipray Paturkar, G. S. Gupta, D. Bailey","doi":"10.1109/IVCNZ51579.2020.9290575","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290575","url":null,"abstract":"3D point cloud segmentation is an important step for plant phenotyping applications. The segmentation should be able to separate the various plant components such as leaves and stem robustly to enable traits to be measured. Also, it is important for the segmentation method to work on range of plant architectures with good accuracy and computation time. In this paper, we propose a segmentation method using Euclidean distance to segment the point cloud generated using a structure-from-motion algorithm. The proposed algorithm requires no prior information about the point cloud. Experimental results illustrate that our proposed method can effectively segment the plant point cloud irrespective of its architecture and growth stage. The proposed method has outperformed the standard methods in terms of computation time and segmentation quality.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132144853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1