Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615812
K. Islam, S. Wijewickrema, Masud Pervez, S. O'Leary
Image classification is an important problem in computer vision research and is useful in applications such as content-based image retrieval and automated detection systems. In recent years, extensive research has been conducted in this field to classify different types of images. In this paper, we investigate one such domain, namely, food image classification. Classification of food images is useful in applications such as waiter-less restaurants and dietary intake calculators. To this end, we explore the use of pre-trained deep convolutional neural networks (DCNNs) in two ways. First, we use transfer learning and re-train the DCNNs on food images. Second, we extract features from pre-trained DCNNs to train conventional classifiers. We also introduce a new food image database based on Australian dietary guidelines. We compare the performance of these methods on existing databases and the one introduced here. We show that similar levels of accuracy are obtained in both methods, but the training time for the latter is significantly lower. We also perform a comparison with existing methods and show that the methods explored here are comparably accurate to existing methods.
{"title":"An Exploration of Deep Transfer Learning for Food Image Classification","authors":"K. Islam, S. Wijewickrema, Masud Pervez, S. O'Leary","doi":"10.1109/DICTA.2018.8615812","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615812","url":null,"abstract":"Image classification is an important problem in computer vision research and is useful in applications such as content-based image retrieval and automated detection systems. In recent years, extensive research has been conducted in this field to classify different types of images. In this paper, we investigate one such domain, namely, food image classification. Classification of food images is useful in applications such as waiter-less restaurants and dietary intake calculators. To this end, we explore the use of pre-trained deep convolutional neural networks (DCNNs) in two ways. First, we use transfer learning and re-train the DCNNs on food images. Second, we extract features from pre-trained DCNNs to train conventional classifiers. We also introduce a new food image database based on Australian dietary guidelines. We compare the performance of these methods on existing databases and the one introduced here. We show that similar levels of accuracy are obtained in both methods, but the training time for the latter is significantly lower. We also perform a comparison with existing methods and show that the methods explored here are comparably accurate to existing methods.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134603618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615826
Md. Asikuzzaman, A. Suman, M. Pickering
Reduction of the temporal redundancies among frames, which can be achieved by the proper motion-compensated prediction, is the key to efficient video compression. Image registration is a technique, which can be exploited to find the motion between the frames. As the motion of an individual scene in a frame is varying across time, it is important to find the motion of the individual object for efficient motion-compensated prediction instead of finding the global motion in a video frame as has been used in the video coding literature. In this paper, we propose a motion estimation technique for video coding that estimates the correct motion of the individual object rather than estimating the motion of the combination of objects in the frame. This method adopts a registration technique using a new edge position difference (EPD) similarity measure to separate the region of individual objects in the frame. Then we apply either EPD-based registration or the Demons registration algorithm to estimate the true motion of each object in the frame. Experimental results show that the proposed EPD-Demons registration algorithm achieves superior motion-compensated prediction of a frame when compared to the global motion estimation-based approach.
{"title":"EPD Similarity Measure and Demons Algorithm for Object-Based Motion Estimation","authors":"Md. Asikuzzaman, A. Suman, M. Pickering","doi":"10.1109/DICTA.2018.8615826","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615826","url":null,"abstract":"Reduction of the temporal redundancies among frames, which can be achieved by the proper motion-compensated prediction, is the key to efficient video compression. Image registration is a technique, which can be exploited to find the motion between the frames. As the motion of an individual scene in a frame is varying across time, it is important to find the motion of the individual object for efficient motion-compensated prediction instead of finding the global motion in a video frame as has been used in the video coding literature. In this paper, we propose a motion estimation technique for video coding that estimates the correct motion of the individual object rather than estimating the motion of the combination of objects in the frame. This method adopts a registration technique using a new edge position difference (EPD) similarity measure to separate the region of individual objects in the frame. Then we apply either EPD-based registration or the Demons registration algorithm to estimate the true motion of each object in the frame. Experimental results show that the proposed EPD-Demons registration algorithm achieves superior motion-compensated prediction of a frame when compared to the global motion estimation-based approach.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124667443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615818
H. L. Kennedy
The Reed-Xiaoli anomaly-detector assumes that spectral samples are distributed as a multivariate Gaussian, which is rarely the case for real data. In this paper it is shown that a spatial pre-filter, with stop- and pass-bands that are tuned to the expected texture in the scene and the scale of the target (respectively), may be used to support this approximation, by decorrelating structured background and attenuating noise. For this purpose, a novel procedure for the design of two-dimensional (2-D) spatial filters, with a finite impulse response (FIR), is proposed. Expressing the optimal spatial filter as a linear combination of a few annular basis-functions with circular symmetry, instead of many shifted unit impulses, degrades the integral-squared error (ISE) of the least-squares solution because there are fewer degrees of freedom but improves the isotropy (ISO) of the filter response. Simulation is used to show that optimal filters with a near-zero ISE and near-unity ISO (i.e. with circular symmetry) have the potential to increase the power of hyperspectral anomaly detectors, by reducing the background variance in each channel.
{"title":"Whitening Pre-Filters with Circular Symmetry for Anomaly Detection in Hyperspectral Imagery","authors":"H. L. Kennedy","doi":"10.1109/DICTA.2018.8615818","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615818","url":null,"abstract":"The Reed-Xiaoli anomaly-detector assumes that spectral samples are distributed as a multivariate Gaussian, which is rarely the case for real data. In this paper it is shown that a spatial pre-filter, with stop- and pass-bands that are tuned to the expected texture in the scene and the scale of the target (respectively), may be used to support this approximation, by decorrelating structured background and attenuating noise. For this purpose, a novel procedure for the design of two-dimensional (2-D) spatial filters, with a finite impulse response (FIR), is proposed. Expressing the optimal spatial filter as a linear combination of a few annular basis-functions with circular symmetry, instead of many shifted unit impulses, degrades the integral-squared error (ISE) of the least-squares solution because there are fewer degrees of freedom but improves the isotropy (ISO) of the filter response. Simulation is used to show that optimal filters with a near-zero ISE and near-unity ISO (i.e. with circular symmetry) have the potential to increase the power of hyperspectral anomaly detectors, by reducing the background variance in each channel.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124950631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615784
Hiroyuki Komori, K. Onoguchi
This paper presents the method that recognizes the road boundary situation from a single image and detects a driving lane based on the recognition result. Driving lane detection is important for lateral motion control of the vehicle and it usually realized based on lane mark detection. However, there are some roads where lane marks such as white lines are not drawn. Also, when the road is covered with snow, lane marks cannot be seen. In these cases, it's necessary to detect the boundary line between the roadside object and the road surfaces. Since traffic lanes are divided by various roadside objects, such as curbs, grass, walls and so on, it's difficult to detect all kinds of road boundary including lane marks by a single algorithm. Therefore, we propose the method which changes the driving lane detection method according to the road boundary situation. At first, the situation of the road boundary is identified as some classes, such as white line, curb, grass and so on, by the Convolutional Neural Network (CNN). Then, based on this result, the lane mark or the boundary between the road surface and the roadside object is detected as the lane boundary. When a clear lane mark is drawn on a road, this situation is identified as a class of "White line" and a lane mark is detected as a lane boundary. On the other hand, when a lane mark is not present, this situation is identified as the other class and the boundary of the roadside object corresponding to the identified class is detected as the lane boundary. Experimental results using the KITTI dataset and our own dataset show the effectiveness of the proposed method. In addition, the result of the proposed method is compared with the boundary of the road area extracted by some semantic segmentation method.
{"title":"Driving Lane Detection Based on Recognition of Road Boundary Situation","authors":"Hiroyuki Komori, K. Onoguchi","doi":"10.1109/DICTA.2018.8615784","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615784","url":null,"abstract":"This paper presents the method that recognizes the road boundary situation from a single image and detects a driving lane based on the recognition result. Driving lane detection is important for lateral motion control of the vehicle and it usually realized based on lane mark detection. However, there are some roads where lane marks such as white lines are not drawn. Also, when the road is covered with snow, lane marks cannot be seen. In these cases, it's necessary to detect the boundary line between the roadside object and the road surfaces. Since traffic lanes are divided by various roadside objects, such as curbs, grass, walls and so on, it's difficult to detect all kinds of road boundary including lane marks by a single algorithm. Therefore, we propose the method which changes the driving lane detection method according to the road boundary situation. At first, the situation of the road boundary is identified as some classes, such as white line, curb, grass and so on, by the Convolutional Neural Network (CNN). Then, based on this result, the lane mark or the boundary between the road surface and the roadside object is detected as the lane boundary. When a clear lane mark is drawn on a road, this situation is identified as a class of \"White line\" and a lane mark is detected as a lane boundary. On the other hand, when a lane mark is not present, this situation is identified as the other class and the boundary of the roadside object corresponding to the identified class is detected as the lane boundary. Experimental results using the KITTI dataset and our own dataset show the effectiveness of the proposed method. In addition, the result of the proposed method is compared with the boundary of the road area extracted by some semantic segmentation method.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130343515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615872
Yiqing Guo, X. Jia, D. Paull
Generating rice variety distribution maps with remote sensing image time series provides meaningful information for intelligent management of rice farms and precise budgeting of irrigation water. However, as different rice varieties share highly similar spectral/temporal patterns, distinguishing one variety from another is highly challenging. In this study, a deep convolutional neural network (deep CNN) is constructed in both spectral and time domains. The purpose is to learn the fine features of each rice variety in terms of its spectral reflectance characteristics and growing phenology, which is a new attempt aiming for agriculture intelligence. An experiment was conducted at a major rice planting area in southwest New South Wales, Australia, during the 2016–17 rice growing season. Based on a ground reference map of rice variety distribution, more than one million labelled samples were collected. Five rice varieties currently grown in the study area are investigated and they are Reiziq, Sherpa, Topaz, YRM 70, and Langi. A time series of multitemporal remote sensing images recorded by the Multispectral Instrument (MSI) on-board the Sentinel-2A satellite was used as inputs. These images covered the entire rice growing season from November 2016 to May 2017. Experimental results showed that a good overall accuracy of 92.87% was achieved with the proposed approach, outperforming a standard support vector machine classifier that produced an accuracy of 57.49%. The Sherpa variety showed the highest producer's accuracy (98.46%), while the highest user's accuracy was observed for the Reiziq variety (97.93%). The results obtained with the proposed deep CNN learning provide the prospect of applying remote sensing image time series for rice variety mapping in an operational context in future.
{"title":"Mapping of Rice Varieties with Sentinel-2 Data via Deep CNN Learning in Spectral and Time Domains","authors":"Yiqing Guo, X. Jia, D. Paull","doi":"10.1109/DICTA.2018.8615872","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615872","url":null,"abstract":"Generating rice variety distribution maps with remote sensing image time series provides meaningful information for intelligent management of rice farms and precise budgeting of irrigation water. However, as different rice varieties share highly similar spectral/temporal patterns, distinguishing one variety from another is highly challenging. In this study, a deep convolutional neural network (deep CNN) is constructed in both spectral and time domains. The purpose is to learn the fine features of each rice variety in terms of its spectral reflectance characteristics and growing phenology, which is a new attempt aiming for agriculture intelligence. An experiment was conducted at a major rice planting area in southwest New South Wales, Australia, during the 2016–17 rice growing season. Based on a ground reference map of rice variety distribution, more than one million labelled samples were collected. Five rice varieties currently grown in the study area are investigated and they are Reiziq, Sherpa, Topaz, YRM 70, and Langi. A time series of multitemporal remote sensing images recorded by the Multispectral Instrument (MSI) on-board the Sentinel-2A satellite was used as inputs. These images covered the entire rice growing season from November 2016 to May 2017. Experimental results showed that a good overall accuracy of 92.87% was achieved with the proposed approach, outperforming a standard support vector machine classifier that produced an accuracy of 57.49%. The Sherpa variety showed the highest producer's accuracy (98.46%), while the highest user's accuracy was observed for the Reiziq variety (97.93%). The results obtained with the proposed deep CNN learning provide the prospect of applying remote sensing image time series for rice variety mapping in an operational context in future.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127398143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615857
Hongxiao Gan, M. Towsey, Yuefeng Li, Jinglan Zhang, P. Roe
Long-duration recordings of the natural environment are very useful in monitoring of animal diversity. After accumulating weeks or even months of recordings, ecologists need an efficient tool to recognize species in those recordings. Automated species recognizers are developed to interpret field-collected recordings and quickly identify species. However, the repetitive work of designing and selecting features for different species is becoming a serious problem for ecologists. This situation creates a demand for generic recognizers that perform well on multiple animal calls. Meanwhile, acoustic indices are proposed to summarize the structure and distribution of acoustic energy in natural environment recordings. They are designed to assess the acoustic activity of animal habitats and do not have discrimination against any species. That characteristic makes them natural generic features for recognizers. In this study, we explore the potential of acoustic indices being generic features and build a kiwi call recognizer with them as a case study. We proposed a kiwi call recognizer built with a Multilayer Perceptron (MLP) classifier and acoustic index features. Experimental results on 13 hours of kiwi call recordings show that our recognizer performs well, in terms of precision, recall and F1 measure. This study shows that acoustic indices have the potential of being generic features that can discriminate multiple animal calls.
{"title":"Animal Call Recognition with Acoustic Indices: Little Spotted Kiwi as a Case Study","authors":"Hongxiao Gan, M. Towsey, Yuefeng Li, Jinglan Zhang, P. Roe","doi":"10.1109/DICTA.2018.8615857","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615857","url":null,"abstract":"Long-duration recordings of the natural environment are very useful in monitoring of animal diversity. After accumulating weeks or even months of recordings, ecologists need an efficient tool to recognize species in those recordings. Automated species recognizers are developed to interpret field-collected recordings and quickly identify species. However, the repetitive work of designing and selecting features for different species is becoming a serious problem for ecologists. This situation creates a demand for generic recognizers that perform well on multiple animal calls. Meanwhile, acoustic indices are proposed to summarize the structure and distribution of acoustic energy in natural environment recordings. They are designed to assess the acoustic activity of animal habitats and do not have discrimination against any species. That characteristic makes them natural generic features for recognizers. In this study, we explore the potential of acoustic indices being generic features and build a kiwi call recognizer with them as a case study. We proposed a kiwi call recognizer built with a Multilayer Perceptron (MLP) classifier and acoustic index features. Experimental results on 13 hours of kiwi call recordings show that our recognizer performs well, in terms of precision, recall and F1 measure. This study shows that acoustic indices have the potential of being generic features that can discriminate multiple animal calls.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"559 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127676912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615836
Deepak Rajamohan, M. Pickering, M. Garratt
Projection of a textured 3D scan, with a fixed scale, will spatially align with the 2D image of the scanned scene only at an unique pose of the scan. If misaligned, the true 3D alignment can be estimated using information from a 2D-2D registration process that minimizes an appropriate error criteria by penalizing mismatch between the overlapping images. Scan data from complicated real-world scenes poses a challenging registration problem due to the tendency of the optimization procedure to become trapped in local minima. In addition, the 3D scan from a stereo camera is of very highresolution and shows mild geometrical distortion adding to the difficulty. This work presents a new registration process using a similarity measure named Edge Position Difference (EPD) combined with a pixel based correlation similarity measure. Together, the technique is able to show consistent and robust 3D-2D registration performance using stereo data, showcasing the potential for extending the technique for practical large scale mapping applications.
{"title":"Using Edge Position Difference and Pixel Correlation for Aligning Stereo-Camera Generated 3D Scans","authors":"Deepak Rajamohan, M. Pickering, M. Garratt","doi":"10.1109/DICTA.2018.8615836","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615836","url":null,"abstract":"Projection of a textured 3D scan, with a fixed scale, will spatially align with the 2D image of the scanned scene only at an unique pose of the scan. If misaligned, the true 3D alignment can be estimated using information from a 2D-2D registration process that minimizes an appropriate error criteria by penalizing mismatch between the overlapping images. Scan data from complicated real-world scenes poses a challenging registration problem due to the tendency of the optimization procedure to become trapped in local minima. In addition, the 3D scan from a stereo camera is of very highresolution and shows mild geometrical distortion adding to the difficulty. This work presents a new registration process using a similarity measure named Edge Position Difference (EPD) combined with a pixel based correlation similarity measure. Together, the technique is able to show consistent and robust 3D-2D registration performance using stereo data, showcasing the potential for extending the technique for practical large scale mapping applications.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615832
M. Asif, Yongsheng Gao, Jun Zhou
In this paper, we introduce the world's first method to automatically classify the moulting stage of Bay lobsters, formally known as Thenus orientális, in a controlled environment. Our classification approach only requires top view images of exoskeleton of bay lobsters. We analyzed the texture of exoskeleton to categorize into normal, moulting stage, and freshly moulted classes. To meet the efficiency and robustness requirements of production platform, we leverage traditional approach such as Local Binary Pattern and Local Derivative Pattern with enhanced encoding scheme for underwater imagery. We also build a dataset of 315 bay lobster images captured at the controlled under water environment. Experimental results on this dataset demonstrated that the proposed method can effectively classify bay lobsters with a high accuracy.
{"title":"Bay Lobsters Moulting Stage Analysis Based on High-Order Texture Descriptor","authors":"M. Asif, Yongsheng Gao, Jun Zhou","doi":"10.1109/DICTA.2018.8615832","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615832","url":null,"abstract":"In this paper, we introduce the world's first method to automatically classify the moulting stage of Bay lobsters, formally known as Thenus orientális, in a controlled environment. Our classification approach only requires top view images of exoskeleton of bay lobsters. We analyzed the texture of exoskeleton to categorize into normal, moulting stage, and freshly moulted classes. To meet the efficiency and robustness requirements of production platform, we leverage traditional approach such as Local Binary Pattern and Local Derivative Pattern with enhanced encoding scheme for underwater imagery. We also build a dataset of 315 bay lobster images captured at the controlled under water environment. Experimental results on this dataset demonstrated that the proposed method can effectively classify bay lobsters with a high accuracy.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615840
Shihao Jiang, R. Hartley, Basura Fernando
Convolutional Neural Networks (CNN) have achieved great success in various computer vision tasks due to their strong ability in feature extraction. The trend of development of CNN architectures is to increase their depth so as to increase their feature extraction ability. Kernel Support Vector Machines (SVM), on the other hand, are known to give optimal separating surfaces by their ability to automatically select support vectors and perform classification in higher dimensional spaces. We investigate the idea of combining the two such that best of both worlds can be achieved and a more compact model can perform as well as deeper CNNs. In the past, attempts have been made to use CNNs to extract features from images and then classify with a kernel SVM, but this process was performed in two separate steps. In this paper, we propose one single model where a CNN and a kernel SVM are integrated together and can be trained end-to-end. In particular, we propose a fully-differentiable Radial Basis Function (RBF) layer, where it can be seamless adapted to a CNN environment and forms a better classifier compared to the normal linear classifier. Due to end-to-end training, our approach allows the initial layers of the CNN to extract features more adapted to the kernel SVM classifier. Our experiments demonstrate that the hybrid CNN-kSVM model gives superior results to a plain CNN model, and also performs better than the method where feature extraction and classification are performed in separate stages, by a CNN and a kernel SVM respectively.
{"title":"Kernel Support Vector Machines and Convolutional Neural Networks","authors":"Shihao Jiang, R. Hartley, Basura Fernando","doi":"10.1109/DICTA.2018.8615840","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615840","url":null,"abstract":"Convolutional Neural Networks (CNN) have achieved great success in various computer vision tasks due to their strong ability in feature extraction. The trend of development of CNN architectures is to increase their depth so as to increase their feature extraction ability. Kernel Support Vector Machines (SVM), on the other hand, are known to give optimal separating surfaces by their ability to automatically select support vectors and perform classification in higher dimensional spaces. We investigate the idea of combining the two such that best of both worlds can be achieved and a more compact model can perform as well as deeper CNNs. In the past, attempts have been made to use CNNs to extract features from images and then classify with a kernel SVM, but this process was performed in two separate steps. In this paper, we propose one single model where a CNN and a kernel SVM are integrated together and can be trained end-to-end. In particular, we propose a fully-differentiable Radial Basis Function (RBF) layer, where it can be seamless adapted to a CNN environment and forms a better classifier compared to the normal linear classifier. Due to end-to-end training, our approach allows the initial layers of the CNN to extract features more adapted to the kernel SVM classifier. Our experiments demonstrate that the hybrid CNN-kSVM model gives superior results to a plain CNN model, and also performs better than the method where feature extraction and classification are performed in separate stages, by a CNN and a kernel SVM respectively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123033327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615827
D. M. Rahaman, M. Paul, N. J. Shoumy
Virtual viewpoint video needs to be synthesised from adjacent reference viewpoints to provide immersive perceptual 3D viewing experience of a scene. View synthesised techniques suffer poor rendering quality due to holes created by occlusion in the warping process. Currently, spatial and temporal correlation of texture images and depth maps are exploited to improve the quality of the final synthesised view. Due to the low spatial correlation at the edge between foreground and background pixels, spatial correlation e.g. inpainting and inverse mapping (IM) techniques cannot fill holes effectively. Conversely, a temporal correlation among already synthesised frames through learning by Gaussian mixture modelling (GMM) fill missing pixels in occluded areas efficiently. In this process, there are no frames for GMM learning when the user switches view instantly. To address the above issues, in the proposed view synthesis technique, we apply GMM on the adjacent reference viewpoint texture images and depth maps to generate a most common frame in a scene (McFIS). Then, texture McFIS is warped into the target viewpoint by using depth McFIS and both warped McFISes are merged. Then, we utilize the number of GMM models to refine pixel intensities of the synthesised view by using a weighting factor between the pixel intensities of the merged McFIS and the warped images. This technique provides a better pixel correspondence and improves 0.58∼0.70dB PSNR compared to the IM technique.
{"title":"Virtual View Quality Enhancement using Side View Temporal Modelling Information for Free Viewpoint Video","authors":"D. M. Rahaman, M. Paul, N. J. Shoumy","doi":"10.1109/DICTA.2018.8615827","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615827","url":null,"abstract":"Virtual viewpoint video needs to be synthesised from adjacent reference viewpoints to provide immersive perceptual 3D viewing experience of a scene. View synthesised techniques suffer poor rendering quality due to holes created by occlusion in the warping process. Currently, spatial and temporal correlation of texture images and depth maps are exploited to improve the quality of the final synthesised view. Due to the low spatial correlation at the edge between foreground and background pixels, spatial correlation e.g. inpainting and inverse mapping (IM) techniques cannot fill holes effectively. Conversely, a temporal correlation among already synthesised frames through learning by Gaussian mixture modelling (GMM) fill missing pixels in occluded areas efficiently. In this process, there are no frames for GMM learning when the user switches view instantly. To address the above issues, in the proposed view synthesis technique, we apply GMM on the adjacent reference viewpoint texture images and depth maps to generate a most common frame in a scene (McFIS). Then, texture McFIS is warped into the target viewpoint by using depth McFIS and both warped McFISes are merged. Then, we utilize the number of GMM models to refine pixel intensities of the synthesised view by using a weighting factor between the pixel intensities of the merged McFIS and the warped images. This technique provides a better pixel correspondence and improves 0.58∼0.70dB PSNR compared to the IM technique.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"41 163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}