Pub Date : 2018-06-27DOI: 10.1109/ICIVC.2018.8492784
Xing Yan, X. Chen
Aiming at the problem that tight arrangement of bundled steel bar, irregular shape of bar end surfaces, adhesions and in the automatic counting prone to multi-count and less-count, a fast and accurate counting method based on single-multi-classification of the connected regions' feature matching is proposed. Firstly, the image of the steel bars' end surface is segmented, morphological and other preprocessing to remove most of the adhesion, and then extracts the features of handled binary image including the area, diameter, center of gravity, shape factor of connected region, according to the area characteristics, the target single - Multi - Classification of the bar image target is classified, and the area feature matching of the single steel bar is counted quickly too, and according to the characteristics of the center of gravity to identify the identification of steel, multiple steel bars establish the template and combine with area, form and other factor to matching counting, so as to achieve the purpose of high efficiency and accurate counting.
{"title":"Research on the Counting Algorithm of Bundled Steel Bars Based on the Features Matching of Connected Regions","authors":"Xing Yan, X. Chen","doi":"10.1109/ICIVC.2018.8492784","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492784","url":null,"abstract":"Aiming at the problem that tight arrangement of bundled steel bar, irregular shape of bar end surfaces, adhesions and in the automatic counting prone to multi-count and less-count, a fast and accurate counting method based on single-multi-classification of the connected regions' feature matching is proposed. Firstly, the image of the steel bars' end surface is segmented, morphological and other preprocessing to remove most of the adhesion, and then extracts the features of handled binary image including the area, diameter, center of gravity, shape factor of connected region, according to the area characteristics, the target single - Multi - Classification of the bar image target is classified, and the area feature matching of the single steel bar is counted quickly too, and according to the characteristics of the center of gravity to identify the identification of steel, multiple steel bars establish the template and combine with area, form and other factor to matching counting, so as to achieve the purpose of high efficiency and accurate counting.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128799721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-27DOI: 10.1109/ICIVC.2018.8492894
J. Ren, N. Reyes, A. Barczak, C. Scogings, M. Liu
Deep learning-based techniques have recently been found significantly effective for handling skeleton-based action recognition tasks. It was observed that modeling the spatiotemporal variations is the key to effective skeleton-based action recognition approaches. This work proposes an easy and yet effective method for encoding different geometric relational features into static color texture images. Collectively, we refer to these features as skeletal optical flow-guided features. The temporal variations of different features are converted into the color variations of their corresponding images. Then, a multi-stream CNN model is employed to pick up the discriminating patterns that exist in the converted images for subsequent classification. Experimental results demonstrate that our proposed geometric relational features and framework can achieve competitive performances on both MSR Action 3D and NTU RGB+D datasets.
{"title":"An Investigation of Skeleton-Based Optical Flow-Guided Features for 3D Action Recognition Using a Multi-Stream CNN Model","authors":"J. Ren, N. Reyes, A. Barczak, C. Scogings, M. Liu","doi":"10.1109/ICIVC.2018.8492894","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492894","url":null,"abstract":"Deep learning-based techniques have recently been found significantly effective for handling skeleton-based action recognition tasks. It was observed that modeling the spatiotemporal variations is the key to effective skeleton-based action recognition approaches. This work proposes an easy and yet effective method for encoding different geometric relational features into static color texture images. Collectively, we refer to these features as skeletal optical flow-guided features. The temporal variations of different features are converted into the color variations of their corresponding images. Then, a multi-stream CNN model is employed to pick up the discriminating patterns that exist in the converted images for subsequent classification. Experimental results demonstrate that our proposed geometric relational features and framework can achieve competitive performances on both MSR Action 3D and NTU RGB+D datasets.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114621320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492792
Ali A. Al-kharaz, A. Chong
Although the digital camera is readily available and the price is decreasing, many users still consider it an expensive device that can be dispensed with by using a smart phone camera. However, both the digital camera and the smartphone need to be calibrated to extract three dimensional (3D) space information from (2D) and to obtain accurate results. This study used close range photogrammetry to calibrate two high resolution digital cameras and a Samsung Galaxy smartphone to find whether any one of them give high accuracy 3D coordinates of the retro-reflective targets that were determined using the self-calibration bundle adjustment method in two phases. The first phase is during walking when 3 trials are conducted. The same three cameras are used for each trial. The second phase is during standing when one trial is conducted. Each of the camera types is placed in front of the platform. The results showed that arguably, the Samsung Galaxy S6 camera is most significant than other cameras in term of accuracy. In addition, this study provides information on how to calibrate one board from other board that has already been calibrated.
{"title":"High Accuracy Smartphone Video Calibration for Human Foot Surface Mapping","authors":"Ali A. Al-kharaz, A. Chong","doi":"10.1109/ICIVC.2018.8492792","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492792","url":null,"abstract":"Although the digital camera is readily available and the price is decreasing, many users still consider it an expensive device that can be dispensed with by using a smart phone camera. However, both the digital camera and the smartphone need to be calibrated to extract three dimensional (3D) space information from (2D) and to obtain accurate results. This study used close range photogrammetry to calibrate two high resolution digital cameras and a Samsung Galaxy smartphone to find whether any one of them give high accuracy 3D coordinates of the retro-reflective targets that were determined using the self-calibration bundle adjustment method in two phases. The first phase is during walking when 3 trials are conducted. The same three cameras are used for each trial. The second phase is during standing when one trial is conducted. Each of the camera types is placed in front of the platform. The results showed that arguably, the Samsung Galaxy S6 camera is most significant than other cameras in term of accuracy. In addition, this study provides information on how to calibrate one board from other board that has already been calibrated.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125122108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492798
Xiaojing Liu, Feng Xue, Lu Teng
The LBP histogram obtained based on the local binary pattern (LBP) method usually has a higher dimension, and not conducive to calculation. The LBP method adopts the gray difference value between single points as the LBP output value, which is not robust to noise and illumination. Therefore, this paper improves the traditional LBP method and proposes a surface defect detection method based on gradient local binary pattern (GLBP), which uses image sub-blocks to reduce the dimensionality of the LBP data matrix. The method adopts weighted binary output values in eight directions within the neighborhood to indicate local gray changes, which suppresses the effects of light and noise on the detection results. Experiments show that the method can determine the defect location well and provide good feature information for subsequent defect classification.
{"title":"Surface Defect Detection Based on Gradient LBP","authors":"Xiaojing Liu, Feng Xue, Lu Teng","doi":"10.1109/ICIVC.2018.8492798","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492798","url":null,"abstract":"The LBP histogram obtained based on the local binary pattern (LBP) method usually has a higher dimension, and not conducive to calculation. The LBP method adopts the gray difference value between single points as the LBP output value, which is not robust to noise and illumination. Therefore, this paper improves the traditional LBP method and proposes a surface defect detection method based on gradient local binary pattern (GLBP), which uses image sub-blocks to reduce the dimensionality of the LBP data matrix. The method adopts weighted binary output values in eight directions within the neighborhood to indicate local gray changes, which suppresses the effects of light and noise on the detection results. Experiments show that the method can determine the defect location well and provide good feature information for subsequent defect classification.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115551158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492823
Xuan Zhaa, Hang Ji, Deng-yin Zhang, H. Bao
Smoke detection based on automatic visual system has been applied to fire alarm in open spaces where traditional smoke detection system is not suitable for it. However, detecting the course of smoke posed great challenges for both systems. To address this problem, we propose a new method that combines context-aware framework with automatic visual smoke detection. The strategy is evaluated on dataset and the results demonstrate the effectiveness of the proposed method.
{"title":"Fire Smoke Detection Based on Contextual Object Detection","authors":"Xuan Zhaa, Hang Ji, Deng-yin Zhang, H. Bao","doi":"10.1109/ICIVC.2018.8492823","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492823","url":null,"abstract":"Smoke detection based on automatic visual system has been applied to fire alarm in open spaces where traditional smoke detection system is not suitable for it. However, detecting the course of smoke posed great challenges for both systems. To address this problem, we propose a new method that combines context-aware framework with automatic visual smoke detection. The strategy is evaluated on dataset and the results demonstrate the effectiveness of the proposed method.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"365 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122315086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492765
Zhongliang Tang
Many CNNs with encoder-decoder structure are widely used in supervised 3D voxel generation. However, their convolutional encoders are usually too simple which causes some local features to degrade during convolution, so it is difficult to extract a good feature representation from an input image by using a simple encoder. Some CNNs apply skip-connection layer for the encoder to reduce the degradation, but general skip-connection layer such as residual layer is not good enough especially when the quantity of convolutional layers in the encoder is relatively small. In this paper, we propose a novel structure called adaptive multi-level saliency network (AMSN) to reduce the degradation of local features. The major innovations of AMSN are multi-level saliency convolution kernels (MSCK) and saliency fusion layer. Different from the kernels used in general skip-connection layer, MSCK are adaptively learned from multi-level salient feature maps rather than initialized randomly. The salient feature maps are sampled from multiple layers in the encoder. MSCK can acquire multi-level features more easily so that we utilize MSCK to acquire local features from low-level layer before the degradation. After that, the acquired local features are fused back into encoder through a saliency fusion layer to reduce the degradation. We evaluated our approach on the ShapeNet and ModelNet40 dataset. The results indicate that our AMSN performs better than related works.
{"title":"Adaptive Multi-Level Saliency Network in 3D Generation","authors":"Zhongliang Tang","doi":"10.1109/ICIVC.2018.8492765","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492765","url":null,"abstract":"Many CNNs with encoder-decoder structure are widely used in supervised 3D voxel generation. However, their convolutional encoders are usually too simple which causes some local features to degrade during convolution, so it is difficult to extract a good feature representation from an input image by using a simple encoder. Some CNNs apply skip-connection layer for the encoder to reduce the degradation, but general skip-connection layer such as residual layer is not good enough especially when the quantity of convolutional layers in the encoder is relatively small. In this paper, we propose a novel structure called adaptive multi-level saliency network (AMSN) to reduce the degradation of local features. The major innovations of AMSN are multi-level saliency convolution kernels (MSCK) and saliency fusion layer. Different from the kernels used in general skip-connection layer, MSCK are adaptively learned from multi-level salient feature maps rather than initialized randomly. The salient feature maps are sampled from multiple layers in the encoder. MSCK can acquire multi-level features more easily so that we utilize MSCK to acquire local features from low-level layer before the degradation. After that, the acquired local features are fused back into encoder through a saliency fusion layer to reduce the degradation. We evaluated our approach on the ShapeNet and ModelNet40 dataset. The results indicate that our AMSN performs better than related works.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122902652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492889
Hongwen Lin, Jian Chen
Hyperspectral image applications have been explored in various areas, but they are often suffered from coarser spatial resolutions. In recent years, many hyperspectral image fusion approaches which merge hyperspectral image with multi-spectral or panchromatic one have been presented to improve the spatial resolution of hyperspectral image. In this paper, we compared four state-of-the-art hyperspectral fusion methods, namely coupled nonnegative matrix factorization (CNMF) method, sparse matrix factorization (SPMF) method, hyperspectral Image superresolution (HySure) method and sparse representation (SPRE) method. The main idea of each method is depicted briefly, five statistical assessment parameters, namely cross correlation (CC), root-mean-square error (RMSE), spectral angle mapper (SAM), universal image quality index (UIQI), and relative dimensionless global error in synthesis (ERGAS) are adopted to comparatively analyze the fusion results. The experimental results show that the effect of method based on sparse representation is superior to the others one.
{"title":"Comparison of Several Hyperspectral Image Fusion Methods for Superresolution","authors":"Hongwen Lin, Jian Chen","doi":"10.1109/ICIVC.2018.8492889","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492889","url":null,"abstract":"Hyperspectral image applications have been explored in various areas, but they are often suffered from coarser spatial resolutions. In recent years, many hyperspectral image fusion approaches which merge hyperspectral image with multi-spectral or panchromatic one have been presented to improve the spatial resolution of hyperspectral image. In this paper, we compared four state-of-the-art hyperspectral fusion methods, namely coupled nonnegative matrix factorization (CNMF) method, sparse matrix factorization (SPMF) method, hyperspectral Image superresolution (HySure) method and sparse representation (SPRE) method. The main idea of each method is depicted briefly, five statistical assessment parameters, namely cross correlation (CC), root-mean-square error (RMSE), spectral angle mapper (SAM), universal image quality index (UIQI), and relative dimensionless global error in synthesis (ERGAS) are adopted to comparatively analyze the fusion results. The experimental results show that the effect of method based on sparse representation is superior to the others one.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128589318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492835
Ling Wang, Peng Miao
Stochastic Resonance phenomenon brings us a new viewpoint of the relation between noise and information which considers the noise as an interacting factor with the information. In this paper, on the Stochastic Resonance phenomenon of neurons in the human visual system, we propose a new model called Iterative Stochastic Resonance, for the visual information extraction from noisy images. The algorithm introduces appropriate noise into the noisy image so that the signal and noise produce a synergistic effect, thereby increasing the energy of the useful signal. The model is then modeled on the characteristics of the human visual system and the results are iteratively computed several times. The model can give perfect denoising output for both simulated and real laser speckle contrast images which are both disturbed by strong noise. It is a new way to solve the problem of effective information extraction in medical noisy images.
{"title":"Iterative Stochastic Resonance Model for Visual Information Extraction from Noisy Environment","authors":"Ling Wang, Peng Miao","doi":"10.1109/ICIVC.2018.8492835","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492835","url":null,"abstract":"Stochastic Resonance phenomenon brings us a new viewpoint of the relation between noise and information which considers the noise as an interacting factor with the information. In this paper, on the Stochastic Resonance phenomenon of neurons in the human visual system, we propose a new model called Iterative Stochastic Resonance, for the visual information extraction from noisy images. The algorithm introduces appropriate noise into the noisy image so that the signal and noise produce a synergistic effect, thereby increasing the energy of the useful signal. The model is then modeled on the characteristics of the human visual system and the results are iteratively computed several times. The model can give perfect denoising output for both simulated and real laser speckle contrast images which are both disturbed by strong noise. It is a new way to solve the problem of effective information extraction in medical noisy images.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124778880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492890
Zhaoxia Xie
Human visual attention can be deal with complex scenes in real time effortlessly and efficiently and detect the most interesting regions quickly. Based on the characteristic of human visual attention, one more comprehensive computation framework is proposed which fully takes the advantage of color contrast to obtain visual fixations of natural color images. Firstly, the color space conversion strategy is employed. The RGB color images are converted into the HSV color space and Lab color space respectively. Then, the superpixels generation algorithm is utilized to segment natural images in the HSV color space and in the Lab color space. Next, color feature-contrast in the two color space is respectively implemented and the corresponding single visual fixation is obtained. Finally, the color feature-fused strategy is adopted in order to get the final visual fixation. Experimental results show that our proposed framework can effectively improve the effect of visual fixations compared with a single color space for the natural color images. Moreover, the full resolution visual fixations can be obtained by employing the proposed framework in this paper compared to the context-aware approach. Meanwhile, these experimental results also clearly demonstrate that the proposed model for saliency estimation is effective.
{"title":"Color Feature Unified-Based Approach for Visual Fixation","authors":"Zhaoxia Xie","doi":"10.1109/ICIVC.2018.8492890","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492890","url":null,"abstract":"Human visual attention can be deal with complex scenes in real time effortlessly and efficiently and detect the most interesting regions quickly. Based on the characteristic of human visual attention, one more comprehensive computation framework is proposed which fully takes the advantage of color contrast to obtain visual fixations of natural color images. Firstly, the color space conversion strategy is employed. The RGB color images are converted into the HSV color space and Lab color space respectively. Then, the superpixels generation algorithm is utilized to segment natural images in the HSV color space and in the Lab color space. Next, color feature-contrast in the two color space is respectively implemented and the corresponding single visual fixation is obtained. Finally, the color feature-fused strategy is adopted in order to get the final visual fixation. Experimental results show that our proposed framework can effectively improve the effect of visual fixations compared with a single color space for the natural color images. Moreover, the full resolution visual fixations can be obtained by employing the proposed framework in this paper compared to the context-aware approach. Meanwhile, these experimental results also clearly demonstrate that the proposed model for saliency estimation is effective.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121295264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ICIVC.2018.8492871
X. Dong, Ning Yan, Ying Wei
A novel insect sound recognition system using enhanced spectrogram and convolutional neural network is proposed. Contrast-limit adaptive histogram equalization (CLAHE) is adopted to enhance R-space spectrogram. Traditionally, artificial feature extraction is an essential step of classification, introducing extra noise caused by subjectivity of individual researchers. In this paper, we construct a convolutional neural network (CNN) as classifier, extracting deep feature by machine learning. Mel-Frequency Cepstral Coefficient (MFCC) and chromatic spectrogram have been compared with enhanced R-space spectrogram as feature image. Eventually, 97.8723 % accuracy rate is achieved among 47 types of insect sound from USDA library.
{"title":"Insect Sound Recognition Based on Convolutional Neural Network","authors":"X. Dong, Ning Yan, Ying Wei","doi":"10.1109/ICIVC.2018.8492871","DOIUrl":"https://doi.org/10.1109/ICIVC.2018.8492871","url":null,"abstract":"A novel insect sound recognition system using enhanced spectrogram and convolutional neural network is proposed. Contrast-limit adaptive histogram equalization (CLAHE) is adopted to enhance R-space spectrogram. Traditionally, artificial feature extraction is an essential step of classification, introducing extra noise caused by subjectivity of individual researchers. In this paper, we construct a convolutional neural network (CNN) as classifier, extracting deep feature by machine learning. Mel-Frequency Cepstral Coefficient (MFCC) and chromatic spectrogram have been compared with enhanced R-space spectrogram as feature image. Eventually, 97.8723 % accuracy rate is achieved among 47 types of insect sound from USDA library.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114344497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}