Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707409
Manikandan Ravikiran
Deep learning in traffic sign detection & recognition (TSDR) is widely explored in recent times due to its ability to produce state-of-the-art results and availability of public datasets. Two different architectures of detection networks are currently being developed: Single Shot and Region Proposal based approaches. Even though for the case of traffic sign detection, single shot method seem adequate, very few works to date has investigated this hypothesis quantitatively, with most works focusing on region proposal based detection architectures. Moreover, with the complexity of the TSDR task and limited performance of region proposal based approaches, a quantitative study of the single shot method is warranted which would, in turn, reveal its strengths and weakness for TSDR. As such in this paper, we revisit this topic through quantitative evaluation of state-of-the-art Single Shot Multibox Detector (SSD) on multiple standard benchmarks. More specifically, we try to quantify 1) Performance of SSD over multiple existing TSDR benchmarks namely GTSDB, STSDB and BTSDB 2) Generalization of SSD across the datasets 3) Impact of class overlap on SSD’s performance 4) Performance of SSD from synthetically generated datasets using Wikipedia Images. Through our study, we show that 1) SSD can reach performance >0.92 AUC for TSDR across standard benchmarks and in the process, we introduce new benchmarks for Romania(RTSDB) and Finland(FTSDB) in line with GTSDB 2) SSD model pretrained on GTSDB generalizes well for BTSDB and RTSDB with average AUC of 0.90 and comparatively lower for Sweden and Finland datasets. We find that scale selection and information loss as the primary reason for the limited generalization. In the due process, to address these issues we propose a convex optimization-based scale selection and Skip SSD - An architecture developed based on the concept of feature reuse leading to improvement in generalization. We also show that 3) SSD model augmented with small synthetically generated dataset produces close to state-of-the-art accuracy across GTSDB, STSDB and BTSDB 4) Class overlap is indeed a challenging problem to be addressed even in case of SSD. Further, we show detailed experiments and summarize our practical findings for those interested in getting the most out of SSD for TSDR.
{"title":"Sign Recognition - How well does Single Shot Multibox Detector sum up? A Quantitative Study","authors":"Manikandan Ravikiran","doi":"10.1109/AIPR.2018.8707409","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707409","url":null,"abstract":"Deep learning in traffic sign detection & recognition (TSDR) is widely explored in recent times due to its ability to produce state-of-the-art results and availability of public datasets. Two different architectures of detection networks are currently being developed: Single Shot and Region Proposal based approaches. Even though for the case of traffic sign detection, single shot method seem adequate, very few works to date has investigated this hypothesis quantitatively, with most works focusing on region proposal based detection architectures. Moreover, with the complexity of the TSDR task and limited performance of region proposal based approaches, a quantitative study of the single shot method is warranted which would, in turn, reveal its strengths and weakness for TSDR. As such in this paper, we revisit this topic through quantitative evaluation of state-of-the-art Single Shot Multibox Detector (SSD) on multiple standard benchmarks. More specifically, we try to quantify 1) Performance of SSD over multiple existing TSDR benchmarks namely GTSDB, STSDB and BTSDB 2) Generalization of SSD across the datasets 3) Impact of class overlap on SSD’s performance 4) Performance of SSD from synthetically generated datasets using Wikipedia Images. Through our study, we show that 1) SSD can reach performance >0.92 AUC for TSDR across standard benchmarks and in the process, we introduce new benchmarks for Romania(RTSDB) and Finland(FTSDB) in line with GTSDB 2) SSD model pretrained on GTSDB generalizes well for BTSDB and RTSDB with average AUC of 0.90 and comparatively lower for Sweden and Finland datasets. We find that scale selection and information loss as the primary reason for the limited generalization. In the due process, to address these issues we propose a convex optimization-based scale selection and Skip SSD - An architecture developed based on the concept of feature reuse leading to improvement in generalization. We also show that 3) SSD model augmented with small synthetically generated dataset produces close to state-of-the-art accuracy across GTSDB, STSDB and BTSDB 4) Class overlap is indeed a challenging problem to be addressed even in case of SSD. Further, we show detailed experiments and summarize our practical findings for those interested in getting the most out of SSD for TSDR.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127410326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707399
Kalpathy Sivaraman, A. Murthy
We report the object-recognition performance of VGG16, ResNet, and SqueezeNet, three state-of-the-art Convolutional Neural Networks (CNNs) trained on ImageNet, across 15 different lighting conditions using the Phos dataset and a ResNet-like network trained on Pascal VOC on the ExDark dataset. The instabilities in the normalized softmax values are used to highlight that pre-trained networks are not robust to lighting variations. Our investigation yields a robustness analysis framework for analyzing the performance of CNNs under different lighting conditions.The Phos dataset consists of 15 scenes captured under different illumination conditions: 9 images captured under various strengths of uniform illumination, and 6 images under different degrees of non-uniform illumination. The ExDARK dataset consists of ten scenes under different illumination conditions. A Keras-based pipeline was developed to study the softmax values output by ImageNet-trained VGG16, ResNet, and SqueezeNet for the same object under the 15 different lighting conditions of the Phos dataset. A ResNet architecture was trained end-to-end on the PASCAL VOC dataset. Large variations observed in the softmax values provide empirical evidence of unstable performance and the need to augment training to account for lighting variations.
{"title":"Object Recognition under Lighting Variations using Pre-Trained Networks","authors":"Kalpathy Sivaraman, A. Murthy","doi":"10.1109/AIPR.2018.8707399","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707399","url":null,"abstract":"We report the object-recognition performance of VGG16, ResNet, and SqueezeNet, three state-of-the-art Convolutional Neural Networks (CNNs) trained on ImageNet, across 15 different lighting conditions using the Phos dataset and a ResNet-like network trained on Pascal VOC on the ExDark dataset. The instabilities in the normalized softmax values are used to highlight that pre-trained networks are not robust to lighting variations. Our investigation yields a robustness analysis framework for analyzing the performance of CNNs under different lighting conditions.The Phos dataset consists of 15 scenes captured under different illumination conditions: 9 images captured under various strengths of uniform illumination, and 6 images under different degrees of non-uniform illumination. The ExDARK dataset consists of ten scenes under different illumination conditions. A Keras-based pipeline was developed to study the softmax values output by ImageNet-trained VGG16, ResNet, and SqueezeNet for the same object under the 15 different lighting conditions of the Phos dataset. A ResNet architecture was trained end-to-end on the PASCAL VOC dataset. Large variations observed in the softmax values provide empirical evidence of unstable performance and the need to augment training to account for lighting variations.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132335427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707397
Ali S. Hamad, I. Ersoy, F. Bunyak
Detection and classification of nuclei in histopathology images is an important step in the research of understanding tumor microenvironment and evaluating cancer progression and prognosis. The task is challenging due to imaging factors such as varying cell morphologies, batch-to-batch variations in staining, and sample preparation. We present a two-stage deep learning pipeline that combines a Fully Convolutional Regression Network (FCRN) that performs nuclei localization with a Convolution Neural Network (CNN) that performs nuclei classification. Instead of using hand-crafted features, the system learns the visual features needed for detection and classification of nuclei making the process robust to the aforementioned variations. The performance of the proposed system has been quantitatively evaluated on images of hematoxylin and eosin (H&E) stained colon cancer tissues and compared to the previous studies using the same data set. The proposed deep learning system produces promising results for detection and classification of nuclei in histopathology images.
{"title":"Improving Nuclei Classification Performance in H&E Stained Tissue Images Using Fully Convolutional Regression Network and Convolutional Neural Network","authors":"Ali S. Hamad, I. Ersoy, F. Bunyak","doi":"10.1109/AIPR.2018.8707397","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707397","url":null,"abstract":"Detection and classification of nuclei in histopathology images is an important step in the research of understanding tumor microenvironment and evaluating cancer progression and prognosis. The task is challenging due to imaging factors such as varying cell morphologies, batch-to-batch variations in staining, and sample preparation. We present a two-stage deep learning pipeline that combines a Fully Convolutional Regression Network (FCRN) that performs nuclei localization with a Convolution Neural Network (CNN) that performs nuclei classification. Instead of using hand-crafted features, the system learns the visual features needed for detection and classification of nuclei making the process robust to the aforementioned variations. The performance of the proposed system has been quantitatively evaluated on images of hematoxylin and eosin (H&E) stained colon cancer tissues and compared to the previous studies using the same data set. The proposed deep learning system produces promising results for detection and classification of nuclei in histopathology images.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116227523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707401
Teena Sharma, P. Agrawal, Piyush Sahoo, N. Verma, S. Vasikarla
Computer vision-based real-time applications demand robust image matching approaches due to disparity in images. This can be achieved using descriptor vector with scale and rotation invariance capability. This paper presents a rotation invariant descriptor vector formation based on line point duality. The proposed descriptor uses a simple consistent method of key point detection. For obtaining the descriptor vector, line segments present in the input image are used. These line segments are located within a region of interest around obtained key points in the input image. The obtained descriptor vector is used for matching of disparate images. Experiments are carried out for four different image sets with rotation at the range of angles to validate the performance of the proposed descriptor in real-time. For comparative study, normalized match ratio is computed using multi-layered neural network with two hidden layers.
{"title":"Line Segments based Rotation Invariant Descriptor for Disparate Images","authors":"Teena Sharma, P. Agrawal, Piyush Sahoo, N. Verma, S. Vasikarla","doi":"10.1109/AIPR.2018.8707401","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707401","url":null,"abstract":"Computer vision-based real-time applications demand robust image matching approaches due to disparity in images. This can be achieved using descriptor vector with scale and rotation invariance capability. This paper presents a rotation invariant descriptor vector formation based on line point duality. The proposed descriptor uses a simple consistent method of key point detection. For obtaining the descriptor vector, line segments present in the input image are used. These line segments are located within a region of interest around obtained key points in the input image. The obtained descriptor vector is used for matching of disparate images. Experiments are carried out for four different image sets with rotation at the range of angles to validate the performance of the proposed descriptor in real-time. For comparative study, normalized match ratio is computed using multi-layered neural network with two hidden layers.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117255886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707431
N. Ronquillo, Josh Harguess
We study the problem of evaluating video-based Generative Adversarial Networks (GANs) by applying existing image quality assessment methods to the explicit evaluation of videos generated by state-of-the-art frameworks [1]–[3]. Specifically, we provide results and discussion on using quantitative methods such as the Fréchet Inception Distance [4], the Multi-scale Structural Similarity Measure (MS-SSIM) [5], as well as the Birthday Paradox inspired test [6] and compare these to the prevalent performance evaluation methods in the literature. We summarize that current testing methodologies are not sufficient for quality assurance in video-based GAN frameworks, and that methods based on the image-based GAN literature can be useful to consider. The results of our experiments and a discussion on evaluating video-based GANs provide key insight that may be useful in generating new measures of quality assurance in future work.
{"title":"On Evaluating Video-based Generative Adversarial Networks (GANs)","authors":"N. Ronquillo, Josh Harguess","doi":"10.1109/AIPR.2018.8707431","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707431","url":null,"abstract":"We study the problem of evaluating video-based Generative Adversarial Networks (GANs) by applying existing image quality assessment methods to the explicit evaluation of videos generated by state-of-the-art frameworks [1]–[3]. Specifically, we provide results and discussion on using quantitative methods such as the Fréchet Inception Distance [4], the Multi-scale Structural Similarity Measure (MS-SSIM) [5], as well as the Birthday Paradox inspired test [6] and compare these to the prevalent performance evaluation methods in the literature. We summarize that current testing methodologies are not sufficient for quality assurance in video-based GAN frameworks, and that methods based on the image-based GAN literature can be useful to consider. The results of our experiments and a discussion on evaluating video-based GANs provide key insight that may be useful in generating new measures of quality assurance in future work.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123464290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707430
Aleem Khaliq, M. Musci, M. Chiaberge
In agricultural practices, it is very essential to monitor crops phenological pattern over the time to manage agronomic activities such as irrigation, weed control, pest control, fertilization, drainage system etc. From the past decade, due to free availability of data and large coverage area, satellite based remote sensing has been most popular and widely used among other techniques such as physical ground surveys, ground based sensors and aerial based remote sensing. Sentinel-2 is European based satellite equipped with the state of the art multispectral imager which offers high spectral resolution (13- spectral bands), high spatial resolution (up to 10m pixel-1) and good temporal resolution (6 to 10days). Considering these features, time series of multispectral images of sentinel-2 has been used to establish temporal pattern of spectral vegetation indices (i.e. NDVI, SAVI, EVI, RVI) of crops to monitor the phenological behavior over time. In addition, the influence of various atmospheric variables (such as temperature in the air and precipitation ) on the derived spectral vegetation indices has also been investigated in this work. Land use and coverage area frame survey (LUCAS-2015) has been used as ground reference data for this study. This study shows that by using sentinel-2, understanding relation between atmospheric conditions and crops phenological behavior can be useful to manage agricultural activities.
{"title":"Understanding effects of atmospheric variables on spectral vegetation indices derived from satellite based time series of multispectral images","authors":"Aleem Khaliq, M. Musci, M. Chiaberge","doi":"10.1109/AIPR.2018.8707430","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707430","url":null,"abstract":"In agricultural practices, it is very essential to monitor crops phenological pattern over the time to manage agronomic activities such as irrigation, weed control, pest control, fertilization, drainage system etc. From the past decade, due to free availability of data and large coverage area, satellite based remote sensing has been most popular and widely used among other techniques such as physical ground surveys, ground based sensors and aerial based remote sensing. Sentinel-2 is European based satellite equipped with the state of the art multispectral imager which offers high spectral resolution (13- spectral bands), high spatial resolution (up to 10m pixel-1) and good temporal resolution (6 to 10days). Considering these features, time series of multispectral images of sentinel-2 has been used to establish temporal pattern of spectral vegetation indices (i.e. NDVI, SAVI, EVI, RVI) of crops to monitor the phenological behavior over time. In addition, the influence of various atmospheric variables (such as temperature in the air and precipitation ) on the derived spectral vegetation indices has also been investigated in this work. Land use and coverage area frame survey (LUCAS-2015) has been used as ground reference data for this study. This study shows that by using sentinel-2, understanding relation between atmospheric conditions and crops phenological behavior can be useful to manage agricultural activities.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123073246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707419
Ryan J. Soldin
The automated detection and classification of objects in imagery is an important topic for many applications in remote sensing. These can include the counting of cars and ships and the tracking of military vehicles for the defense and intelligence industry. Synthetic aperture radar (SAR) provides day/night and all-weather imaging capabilities. SAR is a powerful data source for Deep Learning (DL) algorithms to provide automatic target recognition (ATR) capabilities. DL classification was shown to be extremely effective on multi-spectral satellite imagery during the IARPA Functional Map of the World (fMoW). In our work we look to extend these techniques to SAR. We start by applying ResNet-18 to the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset. The MSTAR program, sponsored by DARPA and AFRL, consists of SAR collections of military style targets using an aerial X-band radar with one-foot resolution. We achieved an overall classification accuracy of 99% on 10 different classes of targets, confirming previously published results. We then extend this classifier to investigate an emerging target and the effects of limited training data on system performance.
{"title":"SAR Target Recognition with Deep Learning","authors":"Ryan J. Soldin","doi":"10.1109/AIPR.2018.8707419","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707419","url":null,"abstract":"The automated detection and classification of objects in imagery is an important topic for many applications in remote sensing. These can include the counting of cars and ships and the tracking of military vehicles for the defense and intelligence industry. Synthetic aperture radar (SAR) provides day/night and all-weather imaging capabilities. SAR is a powerful data source for Deep Learning (DL) algorithms to provide automatic target recognition (ATR) capabilities. DL classification was shown to be extremely effective on multi-spectral satellite imagery during the IARPA Functional Map of the World (fMoW). In our work we look to extend these techniques to SAR. We start by applying ResNet-18 to the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset. The MSTAR program, sponsored by DARPA and AFRL, consists of SAR collections of military style targets using an aerial X-band radar with one-foot resolution. We achieved an overall classification accuracy of 99% on 10 different classes of targets, confirming previously published results. We then extend this classifier to investigate an emerging target and the effects of limited training data on system performance.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123467722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707402
Rumana Aktar, H. Aliakbarpour, F. Bunyak, G. Seetharaman, K. Palaniappan
Mosaicking enables efficient summary of geospatial content in an aerial video with applications in surveillance, activity detection, tracking, etc. Scene clutter, presence of distractors, parallax, illumination artifacts i.e. shadows, glare, and other complexities of aerial imaging such as large camera motion makes the registration process challenging. Robust feature detection and description is needed to overcome these challenges before registration. This study investigates the computational complexity versus performance of selected feature detectors such as Structure Tensor with NCC (ST+NCC), SURF, ASIFT within our Video Mosaicking and Summarization (VMZ) framework on VIRAT benchmark aerial video. ST+NCC and SURF is very fast but fails for few complex imagery (with occlusion) from VIRAT. ASIFT is more robust compared to ST+NCC or SURF, though extremely time consuming. We also propose an Adaptive Descriptor (combining ST+NCC and ASIFT) that is 9x faster than ASIFT with comparable robustness.
{"title":"Performance Evaluation of Feature Descriptors for Aerial Imagery Mosaicking","authors":"Rumana Aktar, H. Aliakbarpour, F. Bunyak, G. Seetharaman, K. Palaniappan","doi":"10.1109/AIPR.2018.8707402","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707402","url":null,"abstract":"Mosaicking enables efficient summary of geospatial content in an aerial video with applications in surveillance, activity detection, tracking, etc. Scene clutter, presence of distractors, parallax, illumination artifacts i.e. shadows, glare, and other complexities of aerial imaging such as large camera motion makes the registration process challenging. Robust feature detection and description is needed to overcome these challenges before registration. This study investigates the computational complexity versus performance of selected feature detectors such as Structure Tensor with NCC (ST+NCC), SURF, ASIFT within our Video Mosaicking and Summarization (VMZ) framework on VIRAT benchmark aerial video. ST+NCC and SURF is very fast but fails for few complex imagery (with occlusion) from VIRAT. ASIFT is more robust compared to ST+NCC or SURF, though extremely time consuming. We also propose an Adaptive Descriptor (combining ST+NCC and ASIFT) that is 9x faster than ASIFT with comparable robustness.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124811134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707425
R. Roberts, J. Goforth, G. Weinert, C. Grant, Will R. Ray, B. Stinson, Andrew M. Duncan
GeoVisipedia is a new and novel approach to annotating satellite imagery. It uses wiki pages to annotate objects rather than simple labels. The use of wiki pages to contain annotations is particularly useful for annotating objects in imagery of complex geospatial configurations such as industrial facilities. GeoVisipedia uses the PRISM algorithm to project annotations applied to one image to other imagery, hence enabling ubiquitous annotation. This paper derives the PRISM algorithm, which uses image metadata and a 3D facility model to create a view matrix unique to each image. The view matrix is used to project model components onto a mask which aligns the components with the objects in the scene that they represent. Wiki pages are linked to model components, which are in turn linked to the image via the component mask. An illustration of the efficacy of the PRISM algorithm is provided, demonstrating the projection of model components onto an effluent stack. We conclude with a discussion of the efficiencies of GeoVisipedia over manual annotation, and the use of PRISM for creating training sets for machine learning algorithms.
{"title":"Automated Annotation of Satellite Imagery using Model-based Projections","authors":"R. Roberts, J. Goforth, G. Weinert, C. Grant, Will R. Ray, B. Stinson, Andrew M. Duncan","doi":"10.1109/AIPR.2018.8707425","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707425","url":null,"abstract":"GeoVisipedia is a new and novel approach to annotating satellite imagery. It uses wiki pages to annotate objects rather than simple labels. The use of wiki pages to contain annotations is particularly useful for annotating objects in imagery of complex geospatial configurations such as industrial facilities. GeoVisipedia uses the PRISM algorithm to project annotations applied to one image to other imagery, hence enabling ubiquitous annotation. This paper derives the PRISM algorithm, which uses image metadata and a 3D facility model to create a view matrix unique to each image. The view matrix is used to project model components onto a mask which aligns the components with the objects in the scene that they represent. Wiki pages are linked to model components, which are in turn linked to the image via the component mask. An illustration of the efficacy of the PRISM algorithm is provided, demonstrating the projection of model components onto an effluent stack. We conclude with a discussion of the efficiencies of GeoVisipedia over manual annotation, and the use of PRISM for creating training sets for machine learning algorithms.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114755744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707423
A. Kalukin
A neural network used to automate assessment of video quality, as measured by the Video National Imagery Interpretability Rating Scale (VNIIRS), was able to ascertain the exact VNIIRS rating over 80% of the time.
{"title":"Automated Video Interpretability Assessment using Convolutional Neural Networks","authors":"A. Kalukin","doi":"10.1109/AIPR.2018.8707423","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707423","url":null,"abstract":"A neural network used to automate assessment of video quality, as measured by the Video National Imagery Interpretability Rating Scale (VNIIRS), was able to ascertain the exact VNIIRS rating over 80% of the time.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"519 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133227853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}