Pub Date : 2020-02-18DOI: 10.1109/MVIP49855.2020.9187485
Saeed Khalilian, Yeganeh Hallaj, Arian Balouchestani, Hossein Karshenas, A. Mohammadi
Printed Circuit boards (PCBs) are one of the most important stages in making electronic products. A small defect in PCBs can cause significant flaws in the final product. Hence, detecting all defects in PCBs and locating them is essential. In this paper, we propose an approach based on denoising convolutional autoencoders for detecting defective PCBs and to locate the defects. Denoising autoencoders take a corrupted image and try to recover the intact image. We trained our model with defective PCBs and forced it to repair the defective parts. Our model not only detects all kinds of defects and locates them, but it can also repair them as well. By subtracting the repaired output from the input, the defective parts are located. The experimental results indicate that our model detects the defective PCBs with high accuracy (97.5%) compare to state of the art works.
{"title":"PCB Defect Detection Using Denoising Convolutional Autoencoders","authors":"Saeed Khalilian, Yeganeh Hallaj, Arian Balouchestani, Hossein Karshenas, A. Mohammadi","doi":"10.1109/MVIP49855.2020.9187485","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9187485","url":null,"abstract":"Printed Circuit boards (PCBs) are one of the most important stages in making electronic products. A small defect in PCBs can cause significant flaws in the final product. Hence, detecting all defects in PCBs and locating them is essential. In this paper, we propose an approach based on denoising convolutional autoencoders for detecting defective PCBs and to locate the defects. Denoising autoencoders take a corrupted image and try to recover the intact image. We trained our model with defective PCBs and forced it to repair the defective parts. Our model not only detects all kinds of defects and locates them, but it can also repair them as well. By subtracting the repaired output from the input, the defective parts are located. The experimental results indicate that our model detects the defective PCBs with high accuracy (97.5%) compare to state of the art works.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131565489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-18DOI: 10.1109/MVIP49855.2020.9187484
Nasme Zandi, F. Razzazi
In this paper we introduce a camera identification method using WLBP texture descriptor. This descriptor has previously been used for texture and face classifiers. In the proposed method, we proposed to use WLBP operator in camera classification application to identify the imaging camera. In our method, the two-dimensional histogram of Weber’s features and LBP for camera identification are investigated. For this purpose, experiments were conducted on Dresden database. The proposed method has reached the accuracy of 99.52% on nine digital cameras of different models. In compressed JPEG images with the compression quality factor of 70% the method reached the accuracy of 89.04%. The results indicate that the proposed method has a high degree of accuracy in comparison to other proposed method and exhibits relatively good robustness to compression.
{"title":"Source Camera Identification Using WLBP Descriptor","authors":"Nasme Zandi, F. Razzazi","doi":"10.1109/MVIP49855.2020.9187484","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9187484","url":null,"abstract":"In this paper we introduce a camera identification method using WLBP texture descriptor. This descriptor has previously been used for texture and face classifiers. In the proposed method, we proposed to use WLBP operator in camera classification application to identify the imaging camera. In our method, the two-dimensional histogram of Weber’s features and LBP for camera identification are investigated. For this purpose, experiments were conducted on Dresden database. The proposed method has reached the accuracy of 99.52% on nine digital cameras of different models. In compressed JPEG images with the compression quality factor of 70% the method reached the accuracy of 89.04%. The results indicate that the proposed method has a high degree of accuracy in comparison to other proposed method and exhibits relatively good robustness to compression.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123884895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-18DOI: 10.1109/MVIP49855.2020.9187483
H. Hosseinpoor, F. Samadzadegan
Buildings are one of the most important components of the city, and their extraction from high-resolution remote sensing images is used in a wide range of applications such as urban mapping. Due to the complex structure of highresolution remote sensing images, automatic extraction of buildings has been a challenge in recent years. In this regard, fully convolutional neural networks (FCNs) have shown successful performance in this task. In this research, a method is proposed to improve the famous UNet network. In classical UNet model high-level rich semantic features are fused with low-level high-resolution features with skip connection for pixel-based segmentation of images. However, the fusion of encoder features with features in corresponding decoder part causes ambiguity in segmentation results because low-level features produce high noise in high-level semantic features. We introduced the embedding feature fusion (EFF) block for enhancing the fusion of low-level with high-level features. For performance evaluation, a publicly available data provided with United States Geological Survey (USGS) high-resolution orthoimagery with the spatial Resolution ranges from 0.15m to 0.3m was used in comparison with several state-of-the-art semantic segmentation model. Experimental results have showed that the proposed architecture improves in extracting complex buildings from high resolution remote sensing images.
{"title":"Convolutional Neural Network for Building Extraction from High-Resolution Remote Sensing Images","authors":"H. Hosseinpoor, F. Samadzadegan","doi":"10.1109/MVIP49855.2020.9187483","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9187483","url":null,"abstract":"Buildings are one of the most important components of the city, and their extraction from high-resolution remote sensing images is used in a wide range of applications such as urban mapping. Due to the complex structure of highresolution remote sensing images, automatic extraction of buildings has been a challenge in recent years. In this regard, fully convolutional neural networks (FCNs) have shown successful performance in this task. In this research, a method is proposed to improve the famous UNet network. In classical UNet model high-level rich semantic features are fused with low-level high-resolution features with skip connection for pixel-based segmentation of images. However, the fusion of encoder features with features in corresponding decoder part causes ambiguity in segmentation results because low-level features produce high noise in high-level semantic features. We introduced the embedding feature fusion (EFF) block for enhancing the fusion of low-level with high-level features. For performance evaluation, a publicly available data provided with United States Geological Survey (USGS) high-resolution orthoimagery with the spatial Resolution ranges from 0.15m to 0.3m was used in comparison with several state-of-the-art semantic segmentation model. Experimental results have showed that the proposed architecture improves in extracting complex buildings from high resolution remote sensing images.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126576745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-18DOI: 10.1109/MVIP49855.2020.9187487
K. Ansari, Alexandre Krebs, Y. Benezeth, F. Marzani
Estimating an intrinsic image from a sequence of successive images taken from an object at different angles of illumination can be used in various applications such as objects recognition, color classification, and the like; because, in so doing, it can provide more visual information. Meanwhile, according to the well-known dichromatic model, each image can be considered a linear combination of three components, including intrinsic image, shading factor, and specularity. In this study, at first, two simple independent constrained and parallelized quadratic programming steps were used for computing values of the shading factor and the specularity of each successive of images. In the algorithm mentioned above, only the mean and standard deviation of three channels for each pixel are required to solve the underdetermined problem of the dichromatic model equations. Then, the singular value decomposition method was used to estimate a unique intrinsic image through the values of the shading factor and the specularity of each of the images that constitute an overdetermined problem. The results of the successive reconstructed images using the estimated unique intrinsic image showed an increase in the visual assessment quality and color gamut of the final images.
{"title":"Estimating intrinsic image from successive images by solving underdetermined and overdetermined systems of the dichromatic model","authors":"K. Ansari, Alexandre Krebs, Y. Benezeth, F. Marzani","doi":"10.1109/MVIP49855.2020.9187487","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9187487","url":null,"abstract":"Estimating an intrinsic image from a sequence of successive images taken from an object at different angles of illumination can be used in various applications such as objects recognition, color classification, and the like; because, in so doing, it can provide more visual information. Meanwhile, according to the well-known dichromatic model, each image can be considered a linear combination of three components, including intrinsic image, shading factor, and specularity. In this study, at first, two simple independent constrained and parallelized quadratic programming steps were used for computing values of the shading factor and the specularity of each successive of images. In the algorithm mentioned above, only the mean and standard deviation of three channels for each pixel are required to solve the underdetermined problem of the dichromatic model equations. Then, the singular value decomposition method was used to estimate a unique intrinsic image through the values of the shading factor and the specularity of each of the images that constitute an overdetermined problem. The results of the successive reconstructed images using the estimated unique intrinsic image showed an increase in the visual assessment quality and color gamut of the final images.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132405570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-18DOI: 10.1109/MVIP49855.2020.9187486
Obeid Sharifi, M. Mokhtarzade, B. Asghari Beirami
To date, various spatial-spectral methods are proposed for accurate classification of hyperspectral images (HSI). Gabor spatial features are the most prominent ones that can extract shallow features such as edges and structures. In recent years, convolutional neural networks (CNN) have been promising in the classification of HSI. Although in literature Gabor features are used as the input of deep models, it seems that the performance of CNN can be improved by two-stage textural features based on local binary patterns of Gabor features. In this paper, input features of CNN are obtained based on local binary patterns of Gabor features which are more discriminative than both Gabor features and local binary patterns features. The experiments performed on the famous Indian Pines HIS, proved the superiority of the proposed method over some other deep learning-based methods.
{"title":"A Deep Convolutional Neural Network based on Local Binary Patterns of Gabor Features for Classification of Hyperspectral Images","authors":"Obeid Sharifi, M. Mokhtarzade, B. Asghari Beirami","doi":"10.1109/MVIP49855.2020.9187486","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9187486","url":null,"abstract":"To date, various spatial-spectral methods are proposed for accurate classification of hyperspectral images (HSI). Gabor spatial features are the most prominent ones that can extract shallow features such as edges and structures. In recent years, convolutional neural networks (CNN) have been promising in the classification of HSI. Although in literature Gabor features are used as the input of deep models, it seems that the performance of CNN can be improved by two-stage textural features based on local binary patterns of Gabor features. In this paper, input features of CNN are obtained based on local binary patterns of Gabor features which are more discriminative than both Gabor features and local binary patterns features. The experiments performed on the famous Indian Pines HIS, proved the superiority of the proposed method over some other deep learning-based methods.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133863638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-18DOI: 10.1109/MVIP49855.2020.9187481
A. Foroozandeh, Ataollah Askari Hemmat, Hossein Rabbani
Recently, deep convolutional neural networks have been successfully applied in different fields of computer vision and pattern recognition. Offline handwritten signature is one of the most important biometrics applied in banking systems, administrative and financial applications, which is a challenging task and still hard. The aim of this study is to review of the presented signature verification/recognition methods based on the convolutional neural networks and also evaluate the performance of some prominent available deep convolutional neural networks in offline handwritten signature verification/recognition as feature extractor using transfer learning. This is done using four pretrained models as the most used general models in computer vision tasks including VGG16, VGG19, ResNet50, and InceptionV3 and also two pre-trained models especially presented for signature processing tasks including SigNet and SigNet- F. Experiments have been conducted using two benchmark signature datasets: GPDS Synthetic signature dataset and MCYT- 75 as Latin signature datasets, and two Persian datasets: UTSig and FUM-PHSD. Obtained experimental results, in comparison with literature, verify the effectiveness of the models: VGG16 and SigNet for signature verification and the superiority of VGG16 in signature recognition task.
{"title":"Offline Handwritten Signature Verification and Recognition Based on Deep Transfer Learning","authors":"A. Foroozandeh, Ataollah Askari Hemmat, Hossein Rabbani","doi":"10.1109/MVIP49855.2020.9187481","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9187481","url":null,"abstract":"Recently, deep convolutional neural networks have been successfully applied in different fields of computer vision and pattern recognition. Offline handwritten signature is one of the most important biometrics applied in banking systems, administrative and financial applications, which is a challenging task and still hard. The aim of this study is to review of the presented signature verification/recognition methods based on the convolutional neural networks and also evaluate the performance of some prominent available deep convolutional neural networks in offline handwritten signature verification/recognition as feature extractor using transfer learning. This is done using four pretrained models as the most used general models in computer vision tasks including VGG16, VGG19, ResNet50, and InceptionV3 and also two pre-trained models especially presented for signature processing tasks including SigNet and SigNet- F. Experiments have been conducted using two benchmark signature datasets: GPDS Synthetic signature dataset and MCYT- 75 as Latin signature datasets, and two Persian datasets: UTSig and FUM-PHSD. Obtained experimental results, in comparison with literature, verify the effectiveness of the models: VGG16 and SigNet for signature verification and the superiority of VGG16 in signature recognition task.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116102528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-18DOI: 10.1109/MVIP49855.2020.9187482
Seyed Ali Hosseini Shamoushaki, Mohammad Mostafa Talebi, Amineh Mazandarani, S. Hosseini
Real-time people counting has become a critical task due to its applications in a wide range of areas such as security, safety, statistics and commerce, implying that the demand for systems that offer such a capability has risen. Consequently, it is important to make it possible for the public to afford a precise, robust people counting system. Therefore, we aim to propose an efficient solution that requires low-cost hardware. Hopefully the people counting product derived from this solution will have a reasonable purchase price when put up for sale. Following the minimal hardware requirement, our system only relies on a depth camera plus a cheap embedded processor. A detection/tracking module forms the core of the underlying theoretical approach whose main functionality is to detect and count any entry/exit occurrences through a generic entrance. Our testing and validation experiments reveal that the proposed system yields a highly satisfactory accuracy rate and can compete closely with similar technologies currently available on the market.
{"title":"A High-Accuracy, Cost-Effective People Counting Solution Based on Visual Depth Data","authors":"Seyed Ali Hosseini Shamoushaki, Mohammad Mostafa Talebi, Amineh Mazandarani, S. Hosseini","doi":"10.1109/MVIP49855.2020.9187482","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9187482","url":null,"abstract":"Real-time people counting has become a critical task due to its applications in a wide range of areas such as security, safety, statistics and commerce, implying that the demand for systems that offer such a capability has risen. Consequently, it is important to make it possible for the public to afford a precise, robust people counting system. Therefore, we aim to propose an efficient solution that requires low-cost hardware. Hopefully the people counting product derived from this solution will have a reasonable purchase price when put up for sale. Following the minimal hardware requirement, our system only relies on a depth camera plus a cheap embedded processor. A detection/tracking module forms the core of the underlying theoretical approach whose main functionality is to detect and count any entry/exit occurrences through a generic entrance. Our testing and validation experiments reveal that the proposed system yields a highly satisfactory accuracy rate and can compete closely with similar technologies currently available on the market.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116808524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116919
Mohammadmahdi Sayadi, H. Ghassemian, Reza Naimi, M. Imani
Medical imaging is a very important element in disease diagnosis. MRI image has structural information, while PET image has functional information. However, there is no medical imagery device that has both structural and functional information simultaneously. Thus, the image fusion technique is used. This work concentrates on PET and MRI fusion. It is based on the combination of retina-inspired model and Non-Subsampled shearlet transform. In the first step, the high-frequency component is obtained by applying the shearlet transform to the MRI image, which produces sub-images in several scales and directions, and by adding up these images together a single edge image is reconstructed. In the second step, the PET image is transferred from RGB color space into IHS color space. Then the low-frequency component is produced by applying a Gaussian low pass filter to the luminance channel of the PET image. By adding up low frequency component and high-frequency component together and transferring the result from IHS color space to RGB color space the fused image is obtained.
{"title":"A New Composite Multimodality Image Fusion Method Based on Shearlet Transform and Retina Inspired Model","authors":"Mohammadmahdi Sayadi, H. Ghassemian, Reza Naimi, M. Imani","doi":"10.1109/MVIP49855.2020.9116919","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116919","url":null,"abstract":"Medical imaging is a very important element in disease diagnosis. MRI image has structural information, while PET image has functional information. However, there is no medical imagery device that has both structural and functional information simultaneously. Thus, the image fusion technique is used. This work concentrates on PET and MRI fusion. It is based on the combination of retina-inspired model and Non-Subsampled shearlet transform. In the first step, the high-frequency component is obtained by applying the shearlet transform to the MRI image, which produces sub-images in several scales and directions, and by adding up these images together a single edge image is reconstructed. In the second step, the PET image is transferred from RGB color space into IHS color space. Then the low-frequency component is produced by applying a Gaussian low pass filter to the luminance channel of the PET image. By adding up low frequency component and high-frequency component together and transferring the result from IHS color space to RGB color space the fused image is obtained.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115209470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116921
M. Amini, H. Sajedi, Tayeb Mahmoodi, S. Mirzaei
Fast and automatic identification of different types of Cortical Dementia, specially Alzheimer’s disease, based on Brain MRI images, is a crucial technology which can help physicians in early and effective treatment. Although preprocessing of MRI images could improve the accuracy of machine learning techniques for classification of the normal and abnormal cases, this could slow down the process of automatic identification and tarnish the applicability of these methods in clinics and laboratories. In this paper we examine classification of a small sample of the original brain MRI images, using a 2D Convolutional Neural Network (CNN). The data consists of 172 healthy individuals as the control group (HC) and only 89 patients with different grades of Dementia (DP) which was collected in National Brain Mapping Center of Iran. The model could achieve an accuracy of 97.47% on the test set and 93.88% based on a 5-fold cross-validation.
{"title":"Fast Prediction of Cortical Dementia Based on Original Brain MRI images Using Convolutional Neural Network","authors":"M. Amini, H. Sajedi, Tayeb Mahmoodi, S. Mirzaei","doi":"10.1109/MVIP49855.2020.9116921","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116921","url":null,"abstract":"Fast and automatic identification of different types of Cortical Dementia, specially Alzheimer’s disease, based on Brain MRI images, is a crucial technology which can help physicians in early and effective treatment. Although preprocessing of MRI images could improve the accuracy of machine learning techniques for classification of the normal and abnormal cases, this could slow down the process of automatic identification and tarnish the applicability of these methods in clinics and laboratories. In this paper we examine classification of a small sample of the original brain MRI images, using a 2D Convolutional Neural Network (CNN). The data consists of 172 healthy individuals as the control group (HC) and only 89 patients with different grades of Dementia (DP) which was collected in National Brain Mapping Center of Iran. The model could achieve an accuracy of 97.47% on the test set and 93.88% based on a 5-fold cross-validation.","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121230743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-02-01DOI: 10.1109/MVIP49855.2020.9116912
M. Taghizadeh, A. Chalechale
This paper presents a hierarchical algorithm using region merging with the aim of achieving a powerful pool of regions for solving computer vision problems. An image is first represented by a graph where each node in the graph is a superpixel. A variety of features is extracted of each region, which is next merged to neighbor regions according to the new algorithm. The proposed algorithm combines adjacent regions based on a similarity metric and a threshold parameter. By applying different amounts for the threshold, a wide range of regions is acquired. The algorithm successfully provides accurate regions while can be represented through the bounding box and segmented candidates. To extensively evaluate, the effectiveness of features and the combination of them are analyzed on MSRC and VOC2012 Segmentation dataset. The achieved results are shown a great improvement at overlapping in comparison to segmentation algorithms. Also, it outperforms previous region proposal algorithms, especially it leads to a relatively great recall at higher overlaps (≥ 0.6).
{"title":"Region Proposal Generation: A Hierarchical Merging Similarity-Based Algorithm","authors":"M. Taghizadeh, A. Chalechale","doi":"10.1109/MVIP49855.2020.9116912","DOIUrl":"https://doi.org/10.1109/MVIP49855.2020.9116912","url":null,"abstract":"This paper presents a hierarchical algorithm using region merging with the aim of achieving a powerful pool of regions for solving computer vision problems. An image is first represented by a graph where each node in the graph is a superpixel. A variety of features is extracted of each region, which is next merged to neighbor regions according to the new algorithm. The proposed algorithm combines adjacent regions based on a similarity metric and a threshold parameter. By applying different amounts for the threshold, a wide range of regions is acquired. The algorithm successfully provides accurate regions while can be represented through the bounding box and segmented candidates. To extensively evaluate, the effectiveness of features and the combination of them are analyzed on MSRC and VOC2012 Segmentation dataset. The achieved results are shown a great improvement at overlapping in comparison to segmentation algorithms. Also, it outperforms previous region proposal algorithms, especially it leads to a relatively great recall at higher overlaps (≥ 0.6).","PeriodicalId":255375,"journal":{"name":"2020 International Conference on Machine Vision and Image Processing (MVIP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127909163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}