Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946005
Thaer F. Ali, A. Woodley
Standard experimental datasets permit comprehensive analysis between approaches. These datasets are ubiquitous in many data science domains but uncommon in remote sensing. This paper presents the Spatial-Temporal Change in Environmental Context (STCEC) dataset, an experimental remote sensing dataset that contains changes (and non-changes) in homogeneous and heterogeneous environments, thereby, enabling researchers to test their approaches in different contexts. STCEC was tested with five pixel interpolation approaches and showed a significant difference between changes in homogeneous and heterogeneous environments. It is hoped that the dataset will be used by other researchers in future work.
{"title":"STCEC: A Remote Sensing Dataset for Identifying Spatial-Temporal Change in Homogeneous and Heterogeneous Environments","authors":"Thaer F. Ali, A. Woodley","doi":"10.1109/DICTA47822.2019.8946005","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946005","url":null,"abstract":"Standard experimental datasets permit comprehensive analysis between approaches. These datasets are ubiquitous in many data science domains but uncommon in remote sensing. This paper presents the Spatial-Temporal Change in Environmental Context (STCEC) dataset, an experimental remote sensing dataset that contains changes (and non-changes) in homogeneous and heterogeneous environments, thereby, enabling researchers to test their approaches in different contexts. STCEC was tested with five pixel interpolation approaches and showed a significant difference between changes in homogeneous and heterogeneous environments. It is hoped that the dataset will be used by other researchers in future work.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"36 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78862454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945887
Ratna Saha, M. Bajger, Gobert N. Lee
Cervical nuclei contain important diagnostic characteristics useful for identifying abnormality in cervical cells. Therefore, an accurate segmentation of nuclei is the primary step in computer-aided diagnosis. However, cell overlapping, uneven staining, poor contrast, and presence of debris elements make this task challenging. A novel method is presented in this paper to detect and segment nuclei from overlapping cervical smear images. The proposed framework segments nuclei by merging superpixels generated by statistical region merging (SRM) algorithm using pairwise regional contrasts and gradient boundaries. To overcome the limitation of finding the optimal parameter value, which controls the coarseness of the segmentation, a new approach for SRM superpixel generation was introduced. Quantitative and qualitative assessment of the proposed framework is carried out using Overlapping Cervical Cytology Image Segmentation Challenge — ISBI 2014 dataset of 945 cervical images. In comparison with the state-of-the-art methods, the proposed methodology achieved superior segmentation performance in terms of Dice similarity coefficient 0.956 and pixel-based recall 0.962. Other evaluation measures such as pixel-based precision 0.930, object-based precision 0.987, and recall 0.944, also compare favorably with some recently published studies. The experimental results demonstrate that the proposed framework can precisely segment nuclei from overlapping cervical cell images, while keeping high level of precision and recall. Therefore, the developed framework may assist cytologists in computerized cervical cell analysis and help with early diagnosis of cervical cancer.
{"title":"SRM Superpixel Merging Framework for Precise Segmentation of Cervical Nucleus","authors":"Ratna Saha, M. Bajger, Gobert N. Lee","doi":"10.1109/DICTA47822.2019.8945887","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945887","url":null,"abstract":"Cervical nuclei contain important diagnostic characteristics useful for identifying abnormality in cervical cells. Therefore, an accurate segmentation of nuclei is the primary step in computer-aided diagnosis. However, cell overlapping, uneven staining, poor contrast, and presence of debris elements make this task challenging. A novel method is presented in this paper to detect and segment nuclei from overlapping cervical smear images. The proposed framework segments nuclei by merging superpixels generated by statistical region merging (SRM) algorithm using pairwise regional contrasts and gradient boundaries. To overcome the limitation of finding the optimal parameter value, which controls the coarseness of the segmentation, a new approach for SRM superpixel generation was introduced. Quantitative and qualitative assessment of the proposed framework is carried out using Overlapping Cervical Cytology Image Segmentation Challenge — ISBI 2014 dataset of 945 cervical images. In comparison with the state-of-the-art methods, the proposed methodology achieved superior segmentation performance in terms of Dice similarity coefficient 0.956 and pixel-based recall 0.962. Other evaluation measures such as pixel-based precision 0.930, object-based precision 0.987, and recall 0.944, also compare favorably with some recently published studies. The experimental results demonstrate that the proposed framework can precisely segment nuclei from overlapping cervical cell images, while keeping high level of precision and recall. Therefore, the developed framework may assist cytologists in computerized cervical cell analysis and help with early diagnosis of cervical cancer.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"49 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85919339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945919
T. Rowntree, C. Pontecorvo, I. Reid
This paper describes a system for estimating the course gaze or 1D head pose of multiple people in a video stream from a moving camera in an indoor scene. The system runs at 30 Hz and can detect human heads with a F-Score of 87.2% and predict their gaze with an average error 20.9° including when they are facing directly away from the camera. The system uses two Convolutional Neural Networks (CNNs) for head detection and gaze estimation respectively and uses common tracking and filtering techniques for smoothing predictions over time. This paper is application-focused and so describes the individual components of the system as well as the techniques used for collecting data and training the CNNs.
{"title":"Real-Time Human Gaze Estimation","authors":"T. Rowntree, C. Pontecorvo, I. Reid","doi":"10.1109/DICTA47822.2019.8945919","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945919","url":null,"abstract":"This paper describes a system for estimating the course gaze or 1D head pose of multiple people in a video stream from a moving camera in an indoor scene. The system runs at 30 Hz and can detect human heads with a F-Score of 87.2% and predict their gaze with an average error 20.9° including when they are facing directly away from the camera. The system uses two Convolutional Neural Networks (CNNs) for head detection and gaze estimation respectively and uses common tracking and filtering techniques for smoothing predictions over time. This paper is application-focused and so describes the individual components of the system as well as the techniques used for collecting data and training the CNNs.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"22 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85938779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946003
Md. Zakir Hossain, F. Sohel, M. F. Shiratuddin, Hamid Laga, Bennamoun
In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.
{"title":"Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning","authors":"Md. Zakir Hossain, F. Sohel, M. F. Shiratuddin, Hamid Laga, Bennamoun","doi":"10.1109/DICTA47822.2019.8946003","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946003","url":null,"abstract":"In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74464655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945888
Chandrama Sarker, Luis Mejías Alvarez, F. Maire, A. Woodley
In this paper, our main aim is to investigate the context-based pixel-wise classification using a fully convolutional neural networks model for flood extent mapping from multispectral remote sensing images. Our approach helps to overcome the limitation of the conventional classification methods with low generalisation ability that used per-pixel spectral information for pixel-wise classification. In this study, a comparative analysis with conventional pixel-wise SVM classifier shows that our proposed model has higher generalisation ability for flooded area detection. By using remote sensing images with different spatial resolutions we also aim to investigate the relationship between image-sensor resolution and neighbourhood window size for context-based classification. Instead of fine-tuning a pre-established deep neural network model, we developed a preliminary base model with two convolutional layers. The model was tested on images with two different spatial resolutions of 3 meters (PlanetScope image) and 30 meters (Landsat-5 Thematic Mapper). During training phase we determined the structure of the convolutional layer as well as the appropriate size of the contextual neighbourhood for those two data types. Preliminary results showed that with increasing the scale of spatial resolutions the required neighbourhood size for training samples also increases. We tested different neighbourhood sized training samples to train the model and the analysis of the performance of those models showed that a 11 × 11 neighbourhood window for PlanetScope data and a 3 × 3 neighbourhood window for Landsat data were found to be the optimum size for classification. Insights from this work may be used to design efficient classifiers in scenarios where data with different resolutions are available.
{"title":"Evaluation of the Impact of Image Spatial Resolution in Designing a Context-Based Fully Convolution Neural Networks for Flood Mapping","authors":"Chandrama Sarker, Luis Mejías Alvarez, F. Maire, A. Woodley","doi":"10.1109/DICTA47822.2019.8945888","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945888","url":null,"abstract":"In this paper, our main aim is to investigate the context-based pixel-wise classification using a fully convolutional neural networks model for flood extent mapping from multispectral remote sensing images. Our approach helps to overcome the limitation of the conventional classification methods with low generalisation ability that used per-pixel spectral information for pixel-wise classification. In this study, a comparative analysis with conventional pixel-wise SVM classifier shows that our proposed model has higher generalisation ability for flooded area detection. By using remote sensing images with different spatial resolutions we also aim to investigate the relationship between image-sensor resolution and neighbourhood window size for context-based classification. Instead of fine-tuning a pre-established deep neural network model, we developed a preliminary base model with two convolutional layers. The model was tested on images with two different spatial resolutions of 3 meters (PlanetScope image) and 30 meters (Landsat-5 Thematic Mapper). During training phase we determined the structure of the convolutional layer as well as the appropriate size of the contextual neighbourhood for those two data types. Preliminary results showed that with increasing the scale of spatial resolutions the required neighbourhood size for training samples also increases. We tested different neighbourhood sized training samples to train the model and the analysis of the performance of those models showed that a 11 × 11 neighbourhood window for PlanetScope data and a 3 × 3 neighbourhood window for Landsat data were found to be the optimum size for classification. Insights from this work may be used to design efficient classifiers in scenarios where data with different resolutions are available.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81043463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945902
Adrian Lapico, M. Sankupellay, Louis Cianciullo, Trina S. Myers, D. Konovalov, D. Jerry, P. Toole, David B. Jones, K. Zenger
The growth rate is a genetic trait that is often recorded in pearl oyster farming for use in selective breeding programs. By tracking the growth rate of a pearl oyster, farmers can make better decisions on which oysters to breed or manage in order to produce healthier offspring and higher quality pearls. However, the current practice of measurement by hand results in measurement inaccuracies, slow processing, and unnecessary employee costs. To rectify this, we propose automating the workflow via computer vision techniques, which can be used to capture images of pearl oysters and process the images to obtain the absolute measurements of each oyster. Specifically, we utilise and compare a set of edge detection algorithms to produce an image-processing algorithm that automatically segments an image containing multiple oysters and returns the height and width of the oyster shell. Our final algorithm was tested on images containing 2523 oysters (Pinctada maxima) captured on farming boats in Indonesia. This algorithm achieved reliability (of identifying at least one required oyster measurement correctly) equal to 92.1%.
{"title":"Using Image Processing to Automatically Measure Pearl Oyster Size for Selective Breeding","authors":"Adrian Lapico, M. Sankupellay, Louis Cianciullo, Trina S. Myers, D. Konovalov, D. Jerry, P. Toole, David B. Jones, K. Zenger","doi":"10.1109/DICTA47822.2019.8945902","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945902","url":null,"abstract":"The growth rate is a genetic trait that is often recorded in pearl oyster farming for use in selective breeding programs. By tracking the growth rate of a pearl oyster, farmers can make better decisions on which oysters to breed or manage in order to produce healthier offspring and higher quality pearls. However, the current practice of measurement by hand results in measurement inaccuracies, slow processing, and unnecessary employee costs. To rectify this, we propose automating the workflow via computer vision techniques, which can be used to capture images of pearl oysters and process the images to obtain the absolute measurements of each oyster. Specifically, we utilise and compare a set of edge detection algorithms to produce an image-processing algorithm that automatically segments an image containing multiple oysters and returns the height and width of the oyster shell. Our final algorithm was tested on images containing 2523 oysters (Pinctada maxima) captured on farming boats in Indonesia. This algorithm achieved reliability (of identifying at least one required oyster measurement correctly) equal to 92.1%.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"58 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79136564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946022
Ava Assadi Abolvardi, Len Hamey, K. Ho-Shon
Deep learning has shown outstanding performance on various computer vision tasks such as image segmentation. To take advantage of deep learning in image segmentation, one would need a huge amount of annotated data since deep learning models are data-intensive. One of the main challenges of using deep learning methods in the medical domain is the shortage of available annotated data. To tackle this problem, in this paper, we propose a registration based framework for augmenting multiple sclerosis datasets. In this framework, by registering images of two different patients, we create a new image, which smoothly adds lesions from the first patient into a brain image, structured like the second patient. Due to their nature, multiple sclerosis lesions vary in shape, size, location and number of occurrence, thus registering images of two different subjects, will create a realistic image. The proposed method is capable of introducing diversity to data distribution, which other traditional augmentation methods do not offer. To check the effectiveness of our proposed method, we compare the performance of 3D-Unet on different augmented and non-augmented datasets. Experimental results indicate that the best performance is achieved when combining both the proposed method with traditional augmentation techniques.
{"title":"Registration Based Data Augmentation for Multiple Sclerosis Lesion Segmentation","authors":"Ava Assadi Abolvardi, Len Hamey, K. Ho-Shon","doi":"10.1109/DICTA47822.2019.8946022","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946022","url":null,"abstract":"Deep learning has shown outstanding performance on various computer vision tasks such as image segmentation. To take advantage of deep learning in image segmentation, one would need a huge amount of annotated data since deep learning models are data-intensive. One of the main challenges of using deep learning methods in the medical domain is the shortage of available annotated data. To tackle this problem, in this paper, we propose a registration based framework for augmenting multiple sclerosis datasets. In this framework, by registering images of two different patients, we create a new image, which smoothly adds lesions from the first patient into a brain image, structured like the second patient. Due to their nature, multiple sclerosis lesions vary in shape, size, location and number of occurrence, thus registering images of two different subjects, will create a realistic image. The proposed method is capable of introducing diversity to data distribution, which other traditional augmentation methods do not offer. To check the effectiveness of our proposed method, we compare the performance of 3D-Unet on different augmented and non-augmented datasets. Experimental results indicate that the best performance is achieved when combining both the proposed method with traditional augmentation techniques.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"68 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85812484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945980
A. Robles-Kelly, A. Nazari
In this paper, we incorporate the Barzilai-Borwein [2] step size into gradient descent methods used to train deep networks. This allows us to adapt the learning rate using a two-point approximation to the secant equation which quasi-Newton methods are based upon. Moreover, the adaptive learning rate method presented here is quite general in nature and can be applied to widely used gradient descent approaches such as Adagrad [7] and RMSprop. We evaluate our method using standard example network architectures on widely available datasets and compare against alternatives elsewhere in the literature. In our experiments, our adaptive learning rate shows a smoother and faster convergence than that exhibited by the alternatives, with better or comparable performance.
{"title":"Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient Methods for Deep Network Training","authors":"A. Robles-Kelly, A. Nazari","doi":"10.1109/DICTA47822.2019.8945980","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945980","url":null,"abstract":"In this paper, we incorporate the Barzilai-Borwein [2] step size into gradient descent methods used to train deep networks. This allows us to adapt the learning rate using a two-point approximation to the secant equation which quasi-Newton methods are based upon. Moreover, the adaptive learning rate method presented here is quite general in nature and can be applied to widely used gradient descent approaches such as Adagrad [7] and RMSprop. We evaluate our method using standard example network architectures on widely available datasets and compare against alternatives elsewhere in the literature. In our experiments, our adaptive learning rate shows a smoother and faster convergence than that exhibited by the alternatives, with better or comparable performance.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"25 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90457223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8946116
Kent Rosser, J. Chahl
Ocean surfaces and large water bodies are commonly monitored by aircraft. While water features are visually non-static, they do include information that allows determination of water motion which has applications in navigation, assessing sub-surface changes and the estimation of drift and size of objects within the scene. This study presents an enhancement of state of the art methods to extract water surface features from imagery acquired by an overhead aircraft and assesses its performance on a real world maritime scene.
{"title":"Wave Scale, Speed and Direction from Airborne Video of Maritime Scene","authors":"Kent Rosser, J. Chahl","doi":"10.1109/DICTA47822.2019.8946116","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8946116","url":null,"abstract":"Ocean surfaces and large water bodies are commonly monitored by aircraft. While water features are visually non-static, they do include information that allows determination of water motion which has applications in navigation, assessing sub-surface changes and the estimation of drift and size of objects within the scene. This study presents an enhancement of state of the art methods to extract water surface features from imagery acquired by an overhead aircraft and assesses its performance on a real world maritime scene.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"47 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89731573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/DICTA47822.2019.8945903
Ryosuke Kamiya, Kyoya Sawada, K. Hotta
In this paper, we propose an object segmentation method in satellite images by the ensemble of models obtained through training process. To improve recognition accuracy, the ensemble of models obtained by different random seeds is used. Here we pay attention to the ensemble of models obtained through training process. In model ensemble, we should integrate the models with different opinions. Since the pixels with low probability such as boundary are often updated through training process, each model in training process has different probability for boundary regions, and the ensemble of those probability maps is effective for improving segmentation accuracy. Effectiveness of the ensemble of training models is demonstrated by experiments on building and road segmentation. Our proposed method improved approximately 4% in comparison with the best model selected by validation. Our method also achieved better accuracy than the standard ensemble of models.
{"title":"Ensemble of Training Models for Road and Building Segmentation","authors":"Ryosuke Kamiya, Kyoya Sawada, K. Hotta","doi":"10.1109/DICTA47822.2019.8945903","DOIUrl":"https://doi.org/10.1109/DICTA47822.2019.8945903","url":null,"abstract":"In this paper, we propose an object segmentation method in satellite images by the ensemble of models obtained through training process. To improve recognition accuracy, the ensemble of models obtained by different random seeds is used. Here we pay attention to the ensemble of models obtained through training process. In model ensemble, we should integrate the models with different opinions. Since the pixels with low probability such as boundary are often updated through training process, each model in training process has different probability for boundary regions, and the ensemble of those probability maps is effective for improving segmentation accuracy. Effectiveness of the ensemble of training models is demonstrated by experiments on building and road segmentation. Our proposed method improved approximately 4% in comparison with the best model selected by validation. Our method also achieved better accuracy than the standard ensemble of models.","PeriodicalId":6696,"journal":{"name":"2019 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89291254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}