Ryan Layne, S. Hannuna, M. Camplani, Jake Hall, Timothy M. Hospedales, T. Xiang, M. Mirmehdi, D. Damen
Video surveillance systems are now widely deployed to improve our lives by enhancing safety, security, health monitoring and business intelligence. This has motivated extensive research into automated video analysis. Nevertheless, there is a gap between the focus of contemporary research, and the needs of end users of video surveillance systems. Many existing benchmarks and methodologies focus on narrowly defined problems in detection, tracking, re-identification or recognition. In contrast, end users face higher-level problems such as long-term monitoring of identities in order to build a picture of a person's activity across the course of a day, producing usage statistics of a particular area of space, and that these capabilities should be robust to challenges such as change of clothing. To achieve this effectively requires less widely studied capabilities such as spatio-temporal reasoning about people identities and locations within a space partially observed by multiple cameras over an extended time period. To bridge this gap between research and required capabilities, we propose a new dataset LIMA that encompasses the challenges of monitoring a typical home / office environment. LIMA contains 4.5 hours of RGB-D video from three cameras monitoring a four room house. To reflect the challenges of a realistic practical application, the dataset includes clothes changes and visitors to ensure the global reasoning is a realistic open-set problem. In addition to raw data, we provide identity annotation for benchmarking, and tracking results from a contemporary RGB-D tracker – thus allowing focus on the higher level monitoring problems.
{"title":"A Dataset for Persistent Multi-target Multi-camera Tracking in RGB-D","authors":"Ryan Layne, S. Hannuna, M. Camplani, Jake Hall, Timothy M. Hospedales, T. Xiang, M. Mirmehdi, D. Damen","doi":"10.1109/CVPRW.2017.189","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.189","url":null,"abstract":"Video surveillance systems are now widely deployed to improve our lives by enhancing safety, security, health monitoring and business intelligence. This has motivated extensive research into automated video analysis. Nevertheless, there is a gap between the focus of contemporary research, and the needs of end users of video surveillance systems. Many existing benchmarks and methodologies focus on narrowly defined problems in detection, tracking, re-identification or recognition. In contrast, end users face higher-level problems such as long-term monitoring of identities in order to build a picture of a person's activity across the course of a day, producing usage statistics of a particular area of space, and that these capabilities should be robust to challenges such as change of clothing. To achieve this effectively requires less widely studied capabilities such as spatio-temporal reasoning about people identities and locations within a space partially observed by multiple cameras over an extended time period. To bridge this gap between research and required capabilities, we propose a new dataset LIMA that encompasses the challenges of monitoring a typical home / office environment. LIMA contains 4.5 hours of RGB-D video from three cameras monitoring a four room house. To reflect the challenges of a realistic practical application, the dataset includes clothes changes and visitors to ensure the global reasoning is a realistic open-set problem. In addition to raw data, we provide identity annotation for benchmarking, and tracking results from a contemporary RGB-D tracker – thus allowing focus on the higher level monitoring problems.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"53 1","pages":"1462-1470"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78210360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the physical world, cause and effect are inseparable: ambient conditions trigger humans to perform actions, thereby driving status changes of objects. In video, these actions and statuses may be hidden due to ambiguity, occlusion, or because they are otherwise unobservable, but humans nevertheless perceive them. In this paper, we extend the Causal And-Or Graph (C-AOG) to a sequential model representing actions and their effects on objects over time, and we build a probability model for it. For inference, we apply a Viterbi algorithm, grounded on probabilistic detections from video, to fill in hidden and misdetected actions and statuses. We analyze our method on a new video dataset that showcases causes and effects. Our results demonstrate the effectiveness of reasoning with causality over time.
{"title":"Inferring Hidden Statuses and Actions in Video by Causal Reasoning","authors":"A. Fire, Song-Chun Zhu","doi":"10.1109/CVPRW.2017.13","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.13","url":null,"abstract":"In the physical world, cause and effect are inseparable: ambient conditions trigger humans to perform actions, thereby driving status changes of objects. In video, these actions and statuses may be hidden due to ambiguity, occlusion, or because they are otherwise unobservable, but humans nevertheless perceive them. In this paper, we extend the Causal And-Or Graph (C-AOG) to a sequential model representing actions and their effects on objects over time, and we build a probability model for it. For inference, we apply a Viterbi algorithm, grounded on probabilistic detections from video, to fill in hidden and misdetected actions and statuses. We analyze our method on a new video dataset that showcases causes and effects. Our results demonstrate the effectiveness of reasoning with causality over time.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"52 1","pages":"48-56"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85921091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
You Li, L. Zhuo, Jiafeng Li, Jing Zhang, Xi Liang, Q. Tian
Person re-identification (re-id) aims to match a specific person across non-overlapping views of different cameras, which is currently one of the hot topics in computer vision. Compared with image-based person re-id, video-based techniques could achieve better performance by fully utilizing the space-time information. This paper presents a novel video-based person re-id method named Deep Feature Guided Pooling (DFGP), which can take full advantage of the space-time information. The contributions of the method are in the following aspects: (1) PCA-based convolutional network (PCN), a lightweight deep learning network, is trained to generate deep features of video frames. Deep features are aggregated by average pooling to obtain person deep feature vectors. The vectors are utilized to guide the generation of human appearance features, which makes the appearance features robust to the severe noise in videos. (2) Hand-crafted local features of videos are aggregated by max pooling to reinforce the motion variations of different persons. In this way, the human descriptors are more discriminative. (3) The final human descriptors are composed of deep features and hand-crafted local features to take their own advantages and the performance of identification is promoted. Experimental results show that our approach outperforms six other state-of-the-art video-based methods on the challenging PRID 2011 and iLIDS-VID video-based person re-id datasets.
{"title":"Video-Based Person Re-identification by Deep Feature Guided Pooling","authors":"You Li, L. Zhuo, Jiafeng Li, Jing Zhang, Xi Liang, Q. Tian","doi":"10.1109/CVPRW.2017.188","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.188","url":null,"abstract":"Person re-identification (re-id) aims to match a specific person across non-overlapping views of different cameras, which is currently one of the hot topics in computer vision. Compared with image-based person re-id, video-based techniques could achieve better performance by fully utilizing the space-time information. This paper presents a novel video-based person re-id method named Deep Feature Guided Pooling (DFGP), which can take full advantage of the space-time information. The contributions of the method are in the following aspects: (1) PCA-based convolutional network (PCN), a lightweight deep learning network, is trained to generate deep features of video frames. Deep features are aggregated by average pooling to obtain person deep feature vectors. The vectors are utilized to guide the generation of human appearance features, which makes the appearance features robust to the severe noise in videos. (2) Hand-crafted local features of videos are aggregated by max pooling to reinforce the motion variations of different persons. In this way, the human descriptors are more discriminative. (3) The final human descriptors are composed of deep features and hand-crafted local features to take their own advantages and the performance of identification is promoted. Experimental results show that our approach outperforms six other state-of-the-art video-based methods on the challenging PRID 2011 and iLIDS-VID video-based person re-id datasets.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"115 1","pages":"1454-1461"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88079170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we focus on constructing an accurate super resolution system based on multiple Convolution Neural Networks (CNNs). Each individual CNN is trained separately with different network structure. A Context-wise Network Fusion (CNF) approach is proposed to integrate the outputs of individual networks by additional convolution layers. With fine-tuning the whole fused network, the accuracy is significantly improved compared to the individual networks. We also discuss other network fusion schemes, including Pixel-Wise network Fusion (PWF) and Progressive Network Fusion (PNF). The experimental results show that the CNF outperforms PWF and PNF. Using SRCNN as individual network, the CNF network achieves the state-of-the-art accuracy on benchmark image datasets.
{"title":"Image Super Resolution Based on Fusing Multiple Convolution Neural Networks","authors":"Haoyu Ren, Mostafa El-Khamy, Jungwon Lee","doi":"10.1109/CVPRW.2017.142","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.142","url":null,"abstract":"In this paper, we focus on constructing an accurate super resolution system based on multiple Convolution Neural Networks (CNNs). Each individual CNN is trained separately with different network structure. A Context-wise Network Fusion (CNF) approach is proposed to integrate the outputs of individual networks by additional convolution layers. With fine-tuning the whole fused network, the accuracy is significantly improved compared to the individual networks. We also discuss other network fusion schemes, including Pixel-Wise network Fusion (PWF) and Progressive Network Fusion (PNF). The experimental results show that the CNF outperforms PWF and PNF. Using SRCNN as individual network, the CNF network achieves the state-of-the-art accuracy on benchmark image datasets.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"64 1","pages":"1050-1057"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91205093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Pujades, Frederic Devernay, Laurent Boiron, Rémi Ronfard
We study camera models to generate stereoscopic zoom shots, i.e. using very long focal length lenses. Stereoscopic images are usually generated with two cameras. However, we show that two cameras are unable to create compelling stereoscopic images for extreme focal length lenses. Inspired by the practitioners' use of the long focal length lenses we propose two different configurations: we "get closer" to the scene, or we create "perspective deformations". Both configurations are build upon state-of-the-art image-based rendering methods allowing the formal deduction of precise parameters of the cameras depending on the scene to be acquired. We present a proof of concept with the acquisition of a representative simplified scene. We discuss the advantages and drawbacks of each configuration.
{"title":"The Stereoscopic Zoom","authors":"S. Pujades, Frederic Devernay, Laurent Boiron, Rémi Ronfard","doi":"10.1109/CVPRW.2017.170","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.170","url":null,"abstract":"We study camera models to generate stereoscopic zoom shots, i.e. using very long focal length lenses. Stereoscopic images are usually generated with two cameras. However, we show that two cameras are unable to create compelling stereoscopic images for extreme focal length lenses. Inspired by the practitioners' use of the long focal length lenses we propose two different configurations: we \"get closer\" to the scene, or we create \"perspective deformations\". Both configurations are build upon state-of-the-art image-based rendering methods allowing the formal deduction of precise parameters of the cameras depending on the scene to be acquired. We present a proof of concept with the acquisition of a representative simplified scene. We discuss the advantages and drawbacks of each configuration.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"47 1","pages":"1295-1304"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81166038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a novel large dataset for example-based single image super-resolution and studies the state-of-the-art as emerged from the NTIRE 2017 challenge. The challenge is the first challenge of its kind, with 6 competitions, hundreds of participants and tens of proposed solutions. Our newly collected DIVerse 2K resolution image dataset (DIV2K) was employed by the challenge. In our study we compare the solutions from the challenge to a set of representative methods from the literature and evaluate them using diverse measures on our proposed DIV2K dataset. Moreover, we conduct a number of experiments and draw conclusions on several topics of interest. We conclude that the NTIRE 2017 challenge pushes the state-of-the-art in single-image super-resolution, reaching the best results to date on the popular Set5, Set14, B100, Urban100 datasets and on our newly proposed DIV2K.
{"title":"NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study","authors":"E. Agustsson, R. Timofte","doi":"10.1109/CVPRW.2017.150","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.150","url":null,"abstract":"This paper introduces a novel large dataset for example-based single image super-resolution and studies the state-of-the-art as emerged from the NTIRE 2017 challenge. The challenge is the first challenge of its kind, with 6 competitions, hundreds of participants and tens of proposed solutions. Our newly collected DIVerse 2K resolution image dataset (DIV2K) was employed by the challenge. In our study we compare the solutions from the challenge to a set of representative methods from the literature and evaluate them using diverse measures on our proposed DIV2K dataset. Moreover, we conduct a number of experiments and draw conclusions on several topics of interest. We conclude that the NTIRE 2017 challenge pushes the state-of-the-art in single-image super-resolution, reaching the best results to date on the popular Set5, Set14, B100, Urban100 datasets and on our newly proposed DIV2K.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"1122-1131"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90613434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work the notion of automated risk assessment for 3D scenes is addressed. Using deep learning techniques smart enabled homes and domestic robots can be equipped with the functionality to detect, draw attention to, or mitigate hazards in a given scene. We extend an existing risk estimation framework that incorporates physics and shape descriptors by introducing a novel CNN architecture allowing risk detection at a patch level. Analysis is conducted on RGB-D data and is performed on a frame by frame basis, requiring no temporal information between frames.
{"title":"Automated Risk Assessment for Scene Understanding and Domestic Robots Using RGB-D Data and 2.5D CNNs at a Patch Level","authors":"Rob Dupre, Georgios Tzimiropoulos, V. Argyriou","doi":"10.1109/CVPRW.2017.65","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.65","url":null,"abstract":"In this work the notion of automated risk assessment for 3D scenes is addressed. Using deep learning techniques smart enabled homes and domestic robots can be equipped with the functionality to detect, draw attention to, or mitigate hazards in a given scene. We extend an existing risk estimation framework that incorporates physics and shape descriptors by introducing a novel CNN architecture allowing risk detection at a patch level. Analysis is conducted on RGB-D data and is performed on a frame by frame basis, requiring no temporal information between frames.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"160 1","pages":"476-477"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77281488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tactical profiling in elite badminton. The proposed approach uses computer vision techniques to automate data gathering from video footage. The image processing algorithm is validated using video footage of the highest level tournaments, including the Olympic Games. The average accuracy of player position detection is 96.03% and 97.09% on the two halves of a badminton court. Next, frequent trajectories of badminton players are extracted and classified according to their tactical relevance. The classification performs at 97.79% accuracy, 97.81% precision, 97.44% recall, and 97.62% F-score. The combination of automated player position detection, frequent trajectory extraction, and the subsequent classification can be used to automatically generate player tactical profiles.
{"title":"Application of Computer Vision and Vector Space Model for Tactical Movement Classification in Badminton","authors":"K. Weeratunga, A. Dharmarathne, K. B. How","doi":"10.1109/CVPRW.2017.22","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.22","url":null,"abstract":"Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tactical profiling in elite badminton. The proposed approach uses computer vision techniques to automate data gathering from video footage. The image processing algorithm is validated using video footage of the highest level tournaments, including the Olympic Games. The average accuracy of player position detection is 96.03% and 97.09% on the two halves of a badminton court. Next, frequent trajectories of badminton players are extracted and classified according to their tactical relevance. The classification performs at 97.79% accuracy, 97.81% precision, 97.44% recall, and 97.62% F-score. The combination of automated player position detection, frequent trajectory extraction, and the subsequent classification can be used to automatically generate player tactical profiles.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"57 1","pages":"132-138"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73838676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we present a set of new evaluation measures for the track-based multi-camera tracking (T-MCT) task leveraging the clustering measurements. We demonstrate that the proposed evaluation measures provide notable advantages over previous ones. Moreover, a distributed and online T-MCT framework is proposed, where re-identification (Re-id) is embedded in T-MCT, to confirm the validity of the proposed evaluation measures. Experimental results reveal that with the proposed evaluation measures, the performance of T-MCT can be accurately measured, which is highly correlated to the performance of Re-id. Furthermore, it is also noted that our T-MCT framework achieves competitive score on the DukeMTMC dataset when compared to the previous work that used global optimization algorithms. Both the evaluation measures and the inter-camera tracking framework are proven to be the stepping stone for multi-camera tracking.
{"title":"Track-Clustering Error Evaluation for Track-Based Multi-camera Tracking System Employing Human Re-identification","authors":"Chih-Wei Wu, Meng-Ting Zhong, Yu-Yu Tsao, Shao-Wen Yang, Yen-kuang Chen, Shao-Yi Chien","doi":"10.1109/CVPRW.2017.184","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.184","url":null,"abstract":"In this study, we present a set of new evaluation measures for the track-based multi-camera tracking (T-MCT) task leveraging the clustering measurements. We demonstrate that the proposed evaluation measures provide notable advantages over previous ones. Moreover, a distributed and online T-MCT framework is proposed, where re-identification (Re-id) is embedded in T-MCT, to confirm the validity of the proposed evaluation measures. Experimental results reveal that with the proposed evaluation measures, the performance of T-MCT can be accurately measured, which is highly correlated to the performance of Re-id. Furthermore, it is also noted that our T-MCT framework achieves competitive score on the DukeMTMC dataset when compared to the previous work that used global optimization algorithms. Both the evaluation measures and the inter-camera tracking framework are proven to be the stepping stone for multi-camera tracking.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"14 1","pages":"1416-1424"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80448147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a framework to register images with very large scale differences by utilizing the point spread function (PSF), and apply it to register hyperspectral and hi-resolution color images. The algorithm minimizes a least-squares (LSQ) objective function with an incorporated spectral response function (SRF), a nonrigid freeform deformation applied on the hyperspectral image and a rigid transformation on the color image. The optimization problem is solved by updating the two transformations and the two physical functions in an alternating fashion. We executed the framework on a simulated Pavia University dataset and a real Salton Sea dataset, by comparing the proposed algorithm with its rigid variation, and two mutual information-based algorithms. The results indicate that the LSQ freeform version has the best performance for the nonrigid simulation and real datasets, with less than 0.15 pixel error given 1 pixel nonrigid distortion in the hyperspectral domain.
{"title":"Nonrigid Registration of Hyperspectral and Color Images with Vastly Different Spatial and Spectral Resolutions for Spectral Unmixing and Pansharpening","authors":"Yuan Zhou, Anand Rangarajan, P. Gader","doi":"10.1109/CVPRW.2017.201","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.201","url":null,"abstract":"In this paper, we propose a framework to register images with very large scale differences by utilizing the point spread function (PSF), and apply it to register hyperspectral and hi-resolution color images. The algorithm minimizes a least-squares (LSQ) objective function with an incorporated spectral response function (SRF), a nonrigid freeform deformation applied on the hyperspectral image and a rigid transformation on the color image. The optimization problem is solved by updating the two transformations and the two physical functions in an alternating fashion. We executed the framework on a simulated Pavia University dataset and a real Salton Sea dataset, by comparing the proposed algorithm with its rigid variation, and two mutual information-based algorithms. The results indicate that the LSQ freeform version has the best performance for the nonrigid simulation and real datasets, with less than 0.15 pixel error given 1 pixel nonrigid distortion in the hyperspectral domain.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"26 1","pages":"1571-1579"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81376323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}