首页 > 最新文献

2018 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
An Exploration of Deep Transfer Learning for Food Image Classification 深度迁移学习在食品图像分类中的应用
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615812
K. Islam, S. Wijewickrema, Masud Pervez, S. O'Leary
Image classification is an important problem in computer vision research and is useful in applications such as content-based image retrieval and automated detection systems. In recent years, extensive research has been conducted in this field to classify different types of images. In this paper, we investigate one such domain, namely, food image classification. Classification of food images is useful in applications such as waiter-less restaurants and dietary intake calculators. To this end, we explore the use of pre-trained deep convolutional neural networks (DCNNs) in two ways. First, we use transfer learning and re-train the DCNNs on food images. Second, we extract features from pre-trained DCNNs to train conventional classifiers. We also introduce a new food image database based on Australian dietary guidelines. We compare the performance of these methods on existing databases and the one introduced here. We show that similar levels of accuracy are obtained in both methods, but the training time for the latter is significantly lower. We also perform a comparison with existing methods and show that the methods explored here are comparably accurate to existing methods.
图像分类是计算机视觉研究中的一个重要问题,在基于内容的图像检索和自动检测系统等应用中非常有用。近年来,在这一领域进行了广泛的研究,以对不同类型的图像进行分类。在本文中,我们研究了一个这样的领域,即食品图像分类。食物图像的分类在诸如没有服务员的餐馆和饮食摄入量计算器等应用中很有用。为此,我们以两种方式探索预训练深度卷积神经网络(DCNNs)的使用。首先,我们使用迁移学习对食物图像上的DCNNs进行重新训练。其次,我们从预训练的DCNNs中提取特征来训练常规分类器。我们还介绍了一个新的基于澳大利亚饮食指南的食物图像数据库。我们比较了这些方法在现有数据库和本文介绍的方法上的性能。我们表明,两种方法都获得了相似的精度水平,但后者的训练时间明显更低。我们还与现有方法进行了比较,表明本文所探索的方法与现有方法相比具有相当的准确性。
{"title":"An Exploration of Deep Transfer Learning for Food Image Classification","authors":"K. Islam, S. Wijewickrema, Masud Pervez, S. O'Leary","doi":"10.1109/DICTA.2018.8615812","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615812","url":null,"abstract":"Image classification is an important problem in computer vision research and is useful in applications such as content-based image retrieval and automated detection systems. In recent years, extensive research has been conducted in this field to classify different types of images. In this paper, we investigate one such domain, namely, food image classification. Classification of food images is useful in applications such as waiter-less restaurants and dietary intake calculators. To this end, we explore the use of pre-trained deep convolutional neural networks (DCNNs) in two ways. First, we use transfer learning and re-train the DCNNs on food images. Second, we extract features from pre-trained DCNNs to train conventional classifiers. We also introduce a new food image database based on Australian dietary guidelines. We compare the performance of these methods on existing databases and the one introduced here. We show that similar levels of accuracy are obtained in both methods, but the training time for the latter is significantly lower. We also perform a comparison with existing methods and show that the methods explored here are comparably accurate to existing methods.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134603618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
EPD Similarity Measure and Demons Algorithm for Object-Based Motion Estimation 基于目标运动估计的EPD相似度量和Demons算法
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615826
Md. Asikuzzaman, A. Suman, M. Pickering
Reduction of the temporal redundancies among frames, which can be achieved by the proper motion-compensated prediction, is the key to efficient video compression. Image registration is a technique, which can be exploited to find the motion between the frames. As the motion of an individual scene in a frame is varying across time, it is important to find the motion of the individual object for efficient motion-compensated prediction instead of finding the global motion in a video frame as has been used in the video coding literature. In this paper, we propose a motion estimation technique for video coding that estimates the correct motion of the individual object rather than estimating the motion of the combination of objects in the frame. This method adopts a registration technique using a new edge position difference (EPD) similarity measure to separate the region of individual objects in the frame. Then we apply either EPD-based registration or the Demons registration algorithm to estimate the true motion of each object in the frame. Experimental results show that the proposed EPD-Demons registration algorithm achieves superior motion-compensated prediction of a frame when compared to the global motion estimation-based approach.
通过适当的运动补偿预测来减少帧间的时间冗余是实现高效视频压缩的关键。图像配准是一种可以用来寻找帧间运动的技术。由于帧中单个场景的运动随时间而变化,因此找到单个对象的运动以进行有效的运动补偿预测是很重要的,而不是像视频编码文献中使用的那样在视频帧中找到全局运动。在本文中,我们提出了一种用于视频编码的运动估计技术,该技术可以估计单个物体的正确运动,而不是估计帧中物体组合的运动。该方法采用一种新的边缘位置差(EPD)相似度测度的配准技术来分离图像中单个目标的区域。然后应用基于epd的配准算法或Demons配准算法来估计帧中每个物体的真实运动。实验结果表明,与基于全局运动估计的方法相比,提出的EPD-Demons配准算法实现了更好的帧运动补偿预测。
{"title":"EPD Similarity Measure and Demons Algorithm for Object-Based Motion Estimation","authors":"Md. Asikuzzaman, A. Suman, M. Pickering","doi":"10.1109/DICTA.2018.8615826","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615826","url":null,"abstract":"Reduction of the temporal redundancies among frames, which can be achieved by the proper motion-compensated prediction, is the key to efficient video compression. Image registration is a technique, which can be exploited to find the motion between the frames. As the motion of an individual scene in a frame is varying across time, it is important to find the motion of the individual object for efficient motion-compensated prediction instead of finding the global motion in a video frame as has been used in the video coding literature. In this paper, we propose a motion estimation technique for video coding that estimates the correct motion of the individual object rather than estimating the motion of the combination of objects in the frame. This method adopts a registration technique using a new edge position difference (EPD) similarity measure to separate the region of individual objects in the frame. Then we apply either EPD-based registration or the Demons registration algorithm to estimate the true motion of each object in the frame. Experimental results show that the proposed EPD-Demons registration algorithm achieves superior motion-compensated prediction of a frame when compared to the global motion estimation-based approach.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124667443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Whitening Pre-Filters with Circular Symmetry for Anomaly Detection in Hyperspectral Imagery 基于圆对称的高光谱图像异常检测白化预滤波器
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615818
H. L. Kennedy
The Reed-Xiaoli anomaly-detector assumes that spectral samples are distributed as a multivariate Gaussian, which is rarely the case for real data. In this paper it is shown that a spatial pre-filter, with stop- and pass-bands that are tuned to the expected texture in the scene and the scale of the target (respectively), may be used to support this approximation, by decorrelating structured background and attenuating noise. For this purpose, a novel procedure for the design of two-dimensional (2-D) spatial filters, with a finite impulse response (FIR), is proposed. Expressing the optimal spatial filter as a linear combination of a few annular basis-functions with circular symmetry, instead of many shifted unit impulses, degrades the integral-squared error (ISE) of the least-squares solution because there are fewer degrees of freedom but improves the isotropy (ISO) of the filter response. Simulation is used to show that optimal filters with a near-zero ISE and near-unity ISO (i.e. with circular symmetry) have the potential to increase the power of hyperspectral anomaly detectors, by reducing the background variance in each channel.
Reed-Xiaoli异常检测器假设光谱样本分布为多元高斯分布,这在实际数据中很少出现。本文表明,通过去相关结构化背景和衰减噪声,可以使用空间预滤波器,其阻带和通带分别调整为场景中的预期纹理和目标的尺度,以支持这种近似。为此,提出了一种设计具有有限脉冲响应(FIR)的二维空间滤波器的新方法。将最优空间滤波器表示为具有圆形对称性的几个环形基函数的线性组合,而不是许多移位的单位脉冲,降低了最小二乘解的积分平方误差(ISE),因为自由度更少,但提高了滤波器响应的各向同性(ISO)。仿真结果表明,具有近零ISE和近统一ISO(即圆对称)的最优滤波器通过减少每个通道的背景方差,有可能增加高光谱异常检测器的功率。
{"title":"Whitening Pre-Filters with Circular Symmetry for Anomaly Detection in Hyperspectral Imagery","authors":"H. L. Kennedy","doi":"10.1109/DICTA.2018.8615818","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615818","url":null,"abstract":"The Reed-Xiaoli anomaly-detector assumes that spectral samples are distributed as a multivariate Gaussian, which is rarely the case for real data. In this paper it is shown that a spatial pre-filter, with stop- and pass-bands that are tuned to the expected texture in the scene and the scale of the target (respectively), may be used to support this approximation, by decorrelating structured background and attenuating noise. For this purpose, a novel procedure for the design of two-dimensional (2-D) spatial filters, with a finite impulse response (FIR), is proposed. Expressing the optimal spatial filter as a linear combination of a few annular basis-functions with circular symmetry, instead of many shifted unit impulses, degrades the integral-squared error (ISE) of the least-squares solution because there are fewer degrees of freedom but improves the isotropy (ISO) of the filter response. Simulation is used to show that optimal filters with a near-zero ISE and near-unity ISO (i.e. with circular symmetry) have the potential to increase the power of hyperspectral anomaly detectors, by reducing the background variance in each channel.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124950631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Driving Lane Detection Based on Recognition of Road Boundary Situation 基于道路边界情况识别的车道检测
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615784
Hiroyuki Komori, K. Onoguchi
This paper presents the method that recognizes the road boundary situation from a single image and detects a driving lane based on the recognition result. Driving lane detection is important for lateral motion control of the vehicle and it usually realized based on lane mark detection. However, there are some roads where lane marks such as white lines are not drawn. Also, when the road is covered with snow, lane marks cannot be seen. In these cases, it's necessary to detect the boundary line between the roadside object and the road surfaces. Since traffic lanes are divided by various roadside objects, such as curbs, grass, walls and so on, it's difficult to detect all kinds of road boundary including lane marks by a single algorithm. Therefore, we propose the method which changes the driving lane detection method according to the road boundary situation. At first, the situation of the road boundary is identified as some classes, such as white line, curb, grass and so on, by the Convolutional Neural Network (CNN). Then, based on this result, the lane mark or the boundary between the road surface and the roadside object is detected as the lane boundary. When a clear lane mark is drawn on a road, this situation is identified as a class of "White line" and a lane mark is detected as a lane boundary. On the other hand, when a lane mark is not present, this situation is identified as the other class and the boundary of the roadside object corresponding to the identified class is detected as the lane boundary. Experimental results using the KITTI dataset and our own dataset show the effectiveness of the proposed method. In addition, the result of the proposed method is compared with the boundary of the road area extracted by some semantic segmentation method.
本文提出了一种从单幅图像中识别道路边界情况,并根据识别结果检测车道的方法。车道检测是车辆横向运动控制的重要内容,通常基于车道标志检测来实现。然而,也有一些道路的车道标志,如白线不画。此外,当道路被雪覆盖时,看不到车道标志。在这些情况下,有必要检测路边物体与路面之间的边界线。由于车道是由路边的各种物体划分的,如路缘、草地、墙壁等,单一算法很难检测到包括车道标记在内的各种道路边界。因此,我们提出了根据道路边界情况改变车道检测方法的方法。首先,通过卷积神经网络(CNN)将道路边界的情况识别为一些类别,如白线、路缘、草地等。然后,在此结果的基础上,检测车道标记或路面与路边物体之间的边界作为车道边界。当道路上绘制了清晰的车道标记时,这种情况被识别为一类“白线”,车道标记被检测为车道边界。另一方面,当车道标志不存在时,将这种情况识别为另一类,并将识别类对应的路边物体的边界检测为车道边界。使用KITTI数据集和我们自己的数据集进行的实验结果表明了该方法的有效性。最后,将该方法的结果与语义分割方法提取的道路区域边界进行了比较。
{"title":"Driving Lane Detection Based on Recognition of Road Boundary Situation","authors":"Hiroyuki Komori, K. Onoguchi","doi":"10.1109/DICTA.2018.8615784","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615784","url":null,"abstract":"This paper presents the method that recognizes the road boundary situation from a single image and detects a driving lane based on the recognition result. Driving lane detection is important for lateral motion control of the vehicle and it usually realized based on lane mark detection. However, there are some roads where lane marks such as white lines are not drawn. Also, when the road is covered with snow, lane marks cannot be seen. In these cases, it's necessary to detect the boundary line between the roadside object and the road surfaces. Since traffic lanes are divided by various roadside objects, such as curbs, grass, walls and so on, it's difficult to detect all kinds of road boundary including lane marks by a single algorithm. Therefore, we propose the method which changes the driving lane detection method according to the road boundary situation. At first, the situation of the road boundary is identified as some classes, such as white line, curb, grass and so on, by the Convolutional Neural Network (CNN). Then, based on this result, the lane mark or the boundary between the road surface and the roadside object is detected as the lane boundary. When a clear lane mark is drawn on a road, this situation is identified as a class of \"White line\" and a lane mark is detected as a lane boundary. On the other hand, when a lane mark is not present, this situation is identified as the other class and the boundary of the roadside object corresponding to the identified class is detected as the lane boundary. Experimental results using the KITTI dataset and our own dataset show the effectiveness of the proposed method. In addition, the result of the proposed method is compared with the boundary of the road area extracted by some semantic segmentation method.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130343515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mapping of Rice Varieties with Sentinel-2 Data via Deep CNN Learning in Spectral and Time Domains 基于谱域和时域深度CNN学习的Sentinel-2数据水稻品种映射
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615872
Yiqing Guo, X. Jia, D. Paull
Generating rice variety distribution maps with remote sensing image time series provides meaningful information for intelligent management of rice farms and precise budgeting of irrigation water. However, as different rice varieties share highly similar spectral/temporal patterns, distinguishing one variety from another is highly challenging. In this study, a deep convolutional neural network (deep CNN) is constructed in both spectral and time domains. The purpose is to learn the fine features of each rice variety in terms of its spectral reflectance characteristics and growing phenology, which is a new attempt aiming for agriculture intelligence. An experiment was conducted at a major rice planting area in southwest New South Wales, Australia, during the 2016–17 rice growing season. Based on a ground reference map of rice variety distribution, more than one million labelled samples were collected. Five rice varieties currently grown in the study area are investigated and they are Reiziq, Sherpa, Topaz, YRM 70, and Langi. A time series of multitemporal remote sensing images recorded by the Multispectral Instrument (MSI) on-board the Sentinel-2A satellite was used as inputs. These images covered the entire rice growing season from November 2016 to May 2017. Experimental results showed that a good overall accuracy of 92.87% was achieved with the proposed approach, outperforming a standard support vector machine classifier that produced an accuracy of 57.49%. The Sherpa variety showed the highest producer's accuracy (98.46%), while the highest user's accuracy was observed for the Reiziq variety (97.93%). The results obtained with the proposed deep CNN learning provide the prospect of applying remote sensing image time series for rice variety mapping in an operational context in future.
利用遥感影像时间序列生成水稻品种分布图,为水稻农田的智能管理和灌溉用水的精确预算提供了有意义的信息。然而,由于不同的水稻品种具有高度相似的光谱/时间模式,因此区分一个品种与另一个品种非常具有挑战性。在本研究中,深度卷积神经网络(deep CNN)在谱域和时间域都被构造。目的是从光谱反射特性和生长物候等方面了解各水稻品种的优良特征,这是农业智能化的新尝试。在2016 - 2017年水稻生长季节,在澳大利亚新南威尔士州西南部的一个主要水稻种植区进行了试验。根据水稻品种分布的地面参考图,收集了100多万个标记样本。研究区目前种植的5个水稻品种分别是Reiziq、Sherpa、Topaz、YRM 70和Langi。以Sentinel-2A卫星上的多光谱仪(Multispectral Instrument, MSI)记录的多时相遥感影像时序为输入。这些图像覆盖了从2016年11月到2017年5月的整个水稻生长季节。实验结果表明,该方法的总体准确率为92.87%,优于标准支持向量机分类器的57.49%。夏尔巴品种的生产者准确率最高(98.46%),雷兹克品种的使用者准确率最高(97.93%)。本文所提出的深度CNN学习的结果为未来应用遥感图像时间序列进行水稻品种定位提供了前景。
{"title":"Mapping of Rice Varieties with Sentinel-2 Data via Deep CNN Learning in Spectral and Time Domains","authors":"Yiqing Guo, X. Jia, D. Paull","doi":"10.1109/DICTA.2018.8615872","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615872","url":null,"abstract":"Generating rice variety distribution maps with remote sensing image time series provides meaningful information for intelligent management of rice farms and precise budgeting of irrigation water. However, as different rice varieties share highly similar spectral/temporal patterns, distinguishing one variety from another is highly challenging. In this study, a deep convolutional neural network (deep CNN) is constructed in both spectral and time domains. The purpose is to learn the fine features of each rice variety in terms of its spectral reflectance characteristics and growing phenology, which is a new attempt aiming for agriculture intelligence. An experiment was conducted at a major rice planting area in southwest New South Wales, Australia, during the 2016–17 rice growing season. Based on a ground reference map of rice variety distribution, more than one million labelled samples were collected. Five rice varieties currently grown in the study area are investigated and they are Reiziq, Sherpa, Topaz, YRM 70, and Langi. A time series of multitemporal remote sensing images recorded by the Multispectral Instrument (MSI) on-board the Sentinel-2A satellite was used as inputs. These images covered the entire rice growing season from November 2016 to May 2017. Experimental results showed that a good overall accuracy of 92.87% was achieved with the proposed approach, outperforming a standard support vector machine classifier that produced an accuracy of 57.49%. The Sherpa variety showed the highest producer's accuracy (98.46%), while the highest user's accuracy was observed for the Reiziq variety (97.93%). The results obtained with the proposed deep CNN learning provide the prospect of applying remote sensing image time series for rice variety mapping in an operational context in future.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127398143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Animal Call Recognition with Acoustic Indices: Little Spotted Kiwi as a Case Study 动物叫声识别声学指数:小斑点猕猴桃为例研究
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615857
Hongxiao Gan, M. Towsey, Yuefeng Li, Jinglan Zhang, P. Roe
Long-duration recordings of the natural environment are very useful in monitoring of animal diversity. After accumulating weeks or even months of recordings, ecologists need an efficient tool to recognize species in those recordings. Automated species recognizers are developed to interpret field-collected recordings and quickly identify species. However, the repetitive work of designing and selecting features for different species is becoming a serious problem for ecologists. This situation creates a demand for generic recognizers that perform well on multiple animal calls. Meanwhile, acoustic indices are proposed to summarize the structure and distribution of acoustic energy in natural environment recordings. They are designed to assess the acoustic activity of animal habitats and do not have discrimination against any species. That characteristic makes them natural generic features for recognizers. In this study, we explore the potential of acoustic indices being generic features and build a kiwi call recognizer with them as a case study. We proposed a kiwi call recognizer built with a Multilayer Perceptron (MLP) classifier and acoustic index features. Experimental results on 13 hours of kiwi call recordings show that our recognizer performs well, in terms of precision, recall and F1 measure. This study shows that acoustic indices have the potential of being generic features that can discriminate multiple animal calls.
长时间的自然环境记录对监测动物多样性非常有用。在积累了数周甚至数月的记录后,生态学家需要一种有效的工具来识别这些记录中的物种。开发了自动物种识别器来解释现场收集的记录并快速识别物种。然而,为不同物种设计和选择特征的重复性工作正成为生态学家面临的一个严重问题。这种情况产生了对在多种动物叫声上表现良好的通用识别器的需求。同时,提出了声学指标来概括自然环境录音中声能的结构和分布。它们旨在评估动物栖息地的声学活动,不歧视任何物种。这一特征使它们成为识别器的自然通用特征。在这项研究中,我们探讨了声学指标作为通用特征的潜力,并以它们为案例研究构建了一个几维鸟叫声识别器。本文提出了一种基于多层感知器(MLP)分类器和声学索引特征的猕猴桃叫声识别器。13小时的猕猴桃通话记录实验结果表明,我们的识别器在准确率、召回率和F1测量方面表现良好。这项研究表明,声学指标有可能成为区分多种动物叫声的通用特征。
{"title":"Animal Call Recognition with Acoustic Indices: Little Spotted Kiwi as a Case Study","authors":"Hongxiao Gan, M. Towsey, Yuefeng Li, Jinglan Zhang, P. Roe","doi":"10.1109/DICTA.2018.8615857","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615857","url":null,"abstract":"Long-duration recordings of the natural environment are very useful in monitoring of animal diversity. After accumulating weeks or even months of recordings, ecologists need an efficient tool to recognize species in those recordings. Automated species recognizers are developed to interpret field-collected recordings and quickly identify species. However, the repetitive work of designing and selecting features for different species is becoming a serious problem for ecologists. This situation creates a demand for generic recognizers that perform well on multiple animal calls. Meanwhile, acoustic indices are proposed to summarize the structure and distribution of acoustic energy in natural environment recordings. They are designed to assess the acoustic activity of animal habitats and do not have discrimination against any species. That characteristic makes them natural generic features for recognizers. In this study, we explore the potential of acoustic indices being generic features and build a kiwi call recognizer with them as a case study. We proposed a kiwi call recognizer built with a Multilayer Perceptron (MLP) classifier and acoustic index features. Experimental results on 13 hours of kiwi call recordings show that our recognizer performs well, in terms of precision, recall and F1 measure. This study shows that acoustic indices have the potential of being generic features that can discriminate multiple animal calls.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"559 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127676912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Edge Position Difference and Pixel Correlation for Aligning Stereo-Camera Generated 3D Scans 利用边缘位置差和像素相关对齐立体相机生成的三维扫描
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615836
Deepak Rajamohan, M. Pickering, M. Garratt
Projection of a textured 3D scan, with a fixed scale, will spatially align with the 2D image of the scanned scene only at an unique pose of the scan. If misaligned, the true 3D alignment can be estimated using information from a 2D-2D registration process that minimizes an appropriate error criteria by penalizing mismatch between the overlapping images. Scan data from complicated real-world scenes poses a challenging registration problem due to the tendency of the optimization procedure to become trapped in local minima. In addition, the 3D scan from a stereo camera is of very highresolution and shows mild geometrical distortion adding to the difficulty. This work presents a new registration process using a similarity measure named Edge Position Difference (EPD) combined with a pixel based correlation similarity measure. Together, the technique is able to show consistent and robust 3D-2D registration performance using stereo data, showcasing the potential for extending the technique for practical large scale mapping applications.
具有固定比例的纹理3D扫描的投影将仅在扫描的独特姿态下与扫描场景的2D图像在空间上对齐。如果不对齐,可以使用来自2D-2D配准过程的信息来估计真正的3D对齐,该配准过程通过惩罚重叠图像之间的不匹配来最小化适当的错误标准。由于优化过程容易陷入局部极小值,复杂现实场景的扫描数据配准问题具有挑战性。此外,立体相机的三维扫描分辨率很高,显示出轻微的几何畸变,这增加了难度。这项工作提出了一种新的配准过程,使用一种称为边缘位置差(EPD)的相似性度量结合基于像素的相关相似性度量。总之,该技术能够使用立体数据显示一致且稳健的3D-2D配准性能,展示了将该技术扩展到实际大比例尺制图应用的潜力。
{"title":"Using Edge Position Difference and Pixel Correlation for Aligning Stereo-Camera Generated 3D Scans","authors":"Deepak Rajamohan, M. Pickering, M. Garratt","doi":"10.1109/DICTA.2018.8615836","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615836","url":null,"abstract":"Projection of a textured 3D scan, with a fixed scale, will spatially align with the 2D image of the scanned scene only at an unique pose of the scan. If misaligned, the true 3D alignment can be estimated using information from a 2D-2D registration process that minimizes an appropriate error criteria by penalizing mismatch between the overlapping images. Scan data from complicated real-world scenes poses a challenging registration problem due to the tendency of the optimization procedure to become trapped in local minima. In addition, the 3D scan from a stereo camera is of very highresolution and shows mild geometrical distortion adding to the difficulty. This work presents a new registration process using a similarity measure named Edge Position Difference (EPD) combined with a pixel based correlation similarity measure. Together, the technique is able to show consistent and robust 3D-2D registration performance using stereo data, showcasing the potential for extending the technique for practical large scale mapping applications.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bay Lobsters Moulting Stage Analysis Based on High-Order Texture Descriptor 基于高阶纹理描述符的海湾龙虾换壳阶段分析
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615832
M. Asif, Yongsheng Gao, Jun Zhou
In this paper, we introduce the world's first method to automatically classify the moulting stage of Bay lobsters, formally known as Thenus orientális, in a controlled environment. Our classification approach only requires top view images of exoskeleton of bay lobsters. We analyzed the texture of exoskeleton to categorize into normal, moulting stage, and freshly moulted classes. To meet the efficiency and robustness requirements of production platform, we leverage traditional approach such as Local Binary Pattern and Local Derivative Pattern with enhanced encoding scheme for underwater imagery. We also build a dataset of 315 bay lobster images captured at the controlled under water environment. Experimental results on this dataset demonstrated that the proposed method can effectively classify bay lobsters with a high accuracy.
在本文中,我们介绍了世界上第一个在受控环境下自动分类海湾龙虾换羽阶段的方法,正式名称为Thenus orientális。我们的分类方法只需要海湾龙虾外骨骼的俯视图图像。我们分析了外骨骼的纹理,将其分为正常、换羽期和刚换羽期。为了满足生产平台对效率和鲁棒性的要求,我们利用传统的局部二值模式和局部导数模式对水下图像进行增强编码。我们还建立了在水下环境控制下捕获的315个海湾龙虾图像数据集。在该数据集上的实验结果表明,该方法可以有效地对海湾龙虾进行分类,具有较高的准确率。
{"title":"Bay Lobsters Moulting Stage Analysis Based on High-Order Texture Descriptor","authors":"M. Asif, Yongsheng Gao, Jun Zhou","doi":"10.1109/DICTA.2018.8615832","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615832","url":null,"abstract":"In this paper, we introduce the world's first method to automatically classify the moulting stage of Bay lobsters, formally known as Thenus orientális, in a controlled environment. Our classification approach only requires top view images of exoskeleton of bay lobsters. We analyzed the texture of exoskeleton to categorize into normal, moulting stage, and freshly moulted classes. To meet the efficiency and robustness requirements of production platform, we leverage traditional approach such as Local Binary Pattern and Local Derivative Pattern with enhanced encoding scheme for underwater imagery. We also build a dataset of 315 bay lobster images captured at the controlled under water environment. Experimental results on this dataset demonstrated that the proposed method can effectively classify bay lobsters with a high accuracy.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Kernel Support Vector Machines and Convolutional Neural Networks 核支持向量机与卷积神经网络
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615840
Shihao Jiang, R. Hartley, Basura Fernando
Convolutional Neural Networks (CNN) have achieved great success in various computer vision tasks due to their strong ability in feature extraction. The trend of development of CNN architectures is to increase their depth so as to increase their feature extraction ability. Kernel Support Vector Machines (SVM), on the other hand, are known to give optimal separating surfaces by their ability to automatically select support vectors and perform classification in higher dimensional spaces. We investigate the idea of combining the two such that best of both worlds can be achieved and a more compact model can perform as well as deeper CNNs. In the past, attempts have been made to use CNNs to extract features from images and then classify with a kernel SVM, but this process was performed in two separate steps. In this paper, we propose one single model where a CNN and a kernel SVM are integrated together and can be trained end-to-end. In particular, we propose a fully-differentiable Radial Basis Function (RBF) layer, where it can be seamless adapted to a CNN environment and forms a better classifier compared to the normal linear classifier. Due to end-to-end training, our approach allows the initial layers of the CNN to extract features more adapted to the kernel SVM classifier. Our experiments demonstrate that the hybrid CNN-kSVM model gives superior results to a plain CNN model, and also performs better than the method where feature extraction and classification are performed in separate stages, by a CNN and a kernel SVM respectively.
卷积神经网络(CNN)由于其强大的特征提取能力,在各种计算机视觉任务中取得了巨大的成功。CNN架构的发展趋势是增加其深度,从而提高其特征提取能力。另一方面,已知核支持向量机(SVM)通过其自动选择支持向量并在高维空间中执行分类的能力来给出最佳分离表面。我们研究了将两者结合起来的想法,这样可以实现两全其美,并且更紧凑的模型可以像更深入的cnn一样表现良好。在过去,已经有人尝试使用cnn从图像中提取特征,然后使用核支持向量机进行分类,但是这个过程分两个单独的步骤进行。在本文中,我们提出了一个单一的模型,其中CNN和核支持向量机集成在一起,可以端到端训练。特别是,我们提出了一个完全可微的径向基函数(RBF)层,它可以无缝地适应CNN环境,与常规线性分类器相比,形成更好的分类器。由于端到端训练,我们的方法允许CNN的初始层提取更适合核SVM分类器的特征。我们的实验表明,混合CNN- ksvm模型的结果优于普通CNN模型,并且也优于分别由CNN和核SVM分阶段进行特征提取和分类的方法。
{"title":"Kernel Support Vector Machines and Convolutional Neural Networks","authors":"Shihao Jiang, R. Hartley, Basura Fernando","doi":"10.1109/DICTA.2018.8615840","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615840","url":null,"abstract":"Convolutional Neural Networks (CNN) have achieved great success in various computer vision tasks due to their strong ability in feature extraction. The trend of development of CNN architectures is to increase their depth so as to increase their feature extraction ability. Kernel Support Vector Machines (SVM), on the other hand, are known to give optimal separating surfaces by their ability to automatically select support vectors and perform classification in higher dimensional spaces. We investigate the idea of combining the two such that best of both worlds can be achieved and a more compact model can perform as well as deeper CNNs. In the past, attempts have been made to use CNNs to extract features from images and then classify with a kernel SVM, but this process was performed in two separate steps. In this paper, we propose one single model where a CNN and a kernel SVM are integrated together and can be trained end-to-end. In particular, we propose a fully-differentiable Radial Basis Function (RBF) layer, where it can be seamless adapted to a CNN environment and forms a better classifier compared to the normal linear classifier. Due to end-to-end training, our approach allows the initial layers of the CNN to extract features more adapted to the kernel SVM classifier. Our experiments demonstrate that the hybrid CNN-kSVM model gives superior results to a plain CNN model, and also performs better than the method where feature extraction and classification are performed in separate stages, by a CNN and a kernel SVM respectively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123033327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Virtual View Quality Enhancement using Side View Temporal Modelling Information for Free Viewpoint Video 使用侧面视图时间建模信息增强免费视点视频的虚拟视图质量
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615827
D. M. Rahaman, M. Paul, N. J. Shoumy
Virtual viewpoint video needs to be synthesised from adjacent reference viewpoints to provide immersive perceptual 3D viewing experience of a scene. View synthesised techniques suffer poor rendering quality due to holes created by occlusion in the warping process. Currently, spatial and temporal correlation of texture images and depth maps are exploited to improve the quality of the final synthesised view. Due to the low spatial correlation at the edge between foreground and background pixels, spatial correlation e.g. inpainting and inverse mapping (IM) techniques cannot fill holes effectively. Conversely, a temporal correlation among already synthesised frames through learning by Gaussian mixture modelling (GMM) fill missing pixels in occluded areas efficiently. In this process, there are no frames for GMM learning when the user switches view instantly. To address the above issues, in the proposed view synthesis technique, we apply GMM on the adjacent reference viewpoint texture images and depth maps to generate a most common frame in a scene (McFIS). Then, texture McFIS is warped into the target viewpoint by using depth McFIS and both warped McFISes are merged. Then, we utilize the number of GMM models to refine pixel intensities of the synthesised view by using a weighting factor between the pixel intensities of the merged McFIS and the warped images. This technique provides a better pixel correspondence and improves 0.58∼0.70dB PSNR compared to the IM technique.
虚拟视点视频需要从相邻的参考视点合成,以提供场景的沉浸式感知3D观看体验。视图合成技术由于在扭曲过程中由遮挡产生的孔而导致渲染质量差。目前,利用纹理图像和深度图的时空相关性来提高最终合成视图的质量。由于前景和背景像素之间的边缘空间相关性较低,因此空间相关性(如inpaint和逆映射)技术不能有效地填充孔。相反,通过高斯混合建模(GMM)的学习,已经合成的帧之间的时间相关性有效地填补了遮挡区域中缺失的像素。在这个过程中,当用户瞬间切换视图时,没有帧用于GMM学习。为了解决上述问题,在提出的视图合成技术中,我们对相邻参考视点纹理图像和深度图应用GMM来生成场景中最常见的帧(McFIS)。然后,使用深度McFIS将纹理McFIS扭曲到目标视点,并合并两个扭曲的McFIS。然后,我们利用GMM模型的数量,通过在合并的McFIS和扭曲图像的像素强度之间使用加权因子来细化合成视图的像素强度。与IM技术相比,该技术提供了更好的像素对应性,并提高了0.58 ~ 0.70dB的PSNR。
{"title":"Virtual View Quality Enhancement using Side View Temporal Modelling Information for Free Viewpoint Video","authors":"D. M. Rahaman, M. Paul, N. J. Shoumy","doi":"10.1109/DICTA.2018.8615827","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615827","url":null,"abstract":"Virtual viewpoint video needs to be synthesised from adjacent reference viewpoints to provide immersive perceptual 3D viewing experience of a scene. View synthesised techniques suffer poor rendering quality due to holes created by occlusion in the warping process. Currently, spatial and temporal correlation of texture images and depth maps are exploited to improve the quality of the final synthesised view. Due to the low spatial correlation at the edge between foreground and background pixels, spatial correlation e.g. inpainting and inverse mapping (IM) techniques cannot fill holes effectively. Conversely, a temporal correlation among already synthesised frames through learning by Gaussian mixture modelling (GMM) fill missing pixels in occluded areas efficiently. In this process, there are no frames for GMM learning when the user switches view instantly. To address the above issues, in the proposed view synthesis technique, we apply GMM on the adjacent reference viewpoint texture images and depth maps to generate a most common frame in a scene (McFIS). Then, texture McFIS is warped into the target viewpoint by using depth McFIS and both warped McFISes are merged. Then, we utilize the number of GMM models to refine pixel intensities of the synthesised view by using a weighting factor between the pixel intensities of the merged McFIS and the warped images. This technique provides a better pixel correspondence and improves 0.58∼0.70dB PSNR compared to the IM technique.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"41 163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2018 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1