Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506237
Chao Pan, Peiyun Zhou, Jingru Tan, Bao-Ye Sun, Ruo-Yu Guan, Zhutao Wang, Ye Luo, Jianwei Lu
Automatic liver tumor detection can assist doctors to make effective treatments. However, how to utilize multi-modal images to improve detection performance is still challenging. Common solutions for using multi-modal images consist of early, inter-layer, and late fusion. They either do not fully consider the intermediate multi-modal feature interaction or have not put their focus on tumor detection. In this paper, we propose a novel multi-scale intermediate multi-modal fusion detection framework to achieve multi-modal liver tumor detection. Unlike early or late fusion, it maintains two branches of different modal information and introduces cross-modal feature interaction progressively, thus better leveraging the complementary information contained in multi-modalities. To further enhance the multi-modal context at all scales, we design a multi-modal enhanced feature pyramid. Extensive experiments on the collected liver tumor magnetic resonance imaging (MRI) dataset show that our framework outperforms other state-of-the-art detection approaches in the case of using multi-modal images.
{"title":"Liver Tumor Detection Via A Multi-Scale Intermediate Multi-Modal Fusion Network on MRI Images","authors":"Chao Pan, Peiyun Zhou, Jingru Tan, Bao-Ye Sun, Ruo-Yu Guan, Zhutao Wang, Ye Luo, Jianwei Lu","doi":"10.1109/ICIP42928.2021.9506237","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506237","url":null,"abstract":"Automatic liver tumor detection can assist doctors to make effective treatments. However, how to utilize multi-modal images to improve detection performance is still challenging. Common solutions for using multi-modal images consist of early, inter-layer, and late fusion. They either do not fully consider the intermediate multi-modal feature interaction or have not put their focus on tumor detection. In this paper, we propose a novel multi-scale intermediate multi-modal fusion detection framework to achieve multi-modal liver tumor detection. Unlike early or late fusion, it maintains two branches of different modal information and introduces cross-modal feature interaction progressively, thus better leveraging the complementary information contained in multi-modalities. To further enhance the multi-modal context at all scales, we design a multi-modal enhanced feature pyramid. Extensive experiments on the collected liver tumor magnetic resonance imaging (MRI) dataset show that our framework outperforms other state-of-the-art detection approaches in the case of using multi-modal images.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122551476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506301
George Retsinas, Athena Elafrou, G. Goumas, P. Maragos
Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust sparsity controlling framework that efficiently prunes network parameters during training with minimal computational overhead. We incorporate fast mechanisms to prune individual layers and build upon these to automatically prune the entire network under a user-defined budget constraint. Key to our end-to-end network pruning approach is the formulation of an intuitive and easy-to-implement adaptive sparsity loss used to explicitly control sparsity during training, enabling efficient budget-aware optimization.
{"title":"Online Weight Pruning Via Adaptive Sparsity Loss","authors":"George Retsinas, Athena Elafrou, G. Goumas, P. Maragos","doi":"10.1109/ICIP42928.2021.9506301","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506301","url":null,"abstract":"Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust sparsity controlling framework that efficiently prunes network parameters during training with minimal computational overhead. We incorporate fast mechanisms to prune individual layers and build upon these to automatically prune the entire network under a user-defined budget constraint. Key to our end-to-end network pruning approach is the formulation of an intuitive and easy-to-implement adaptive sparsity loss used to explicitly control sparsity during training, enabling efficient budget-aware optimization.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122593127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506493
Lanqi Liu, Zhenyu Duan, Guozheng Xu, Yi Xu
Recently, the security of deep learning algorithms against adversarial samples has been widely recognized. Most of the existing defense methods only consider the attack influence on image level, while the effect of correlation among feature components has not been investigated. In fact, when one feature component is successfully attacked, its correlated components can be attacked with higher probability. In this paper, a self-supervised disentanglement based defense framework is proposed, providing a general tool to disentangle features by greatly reducing correlation among feature components, thus significantly improving the robustness of the classification network. The proposed framework reveals the important role of disentangled embedding in defending adversarial samples. Extensive experiments on several benchmark datasets validate that the proposed defense framework consistently presents its robustness against extensive adversarial attacks. Also, the proposed model can be applied to any typical defense method as a good promotion strategy.
{"title":"Self-Supervised Disentangled Embedding For Robust Image Classification","authors":"Lanqi Liu, Zhenyu Duan, Guozheng Xu, Yi Xu","doi":"10.1109/ICIP42928.2021.9506493","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506493","url":null,"abstract":"Recently, the security of deep learning algorithms against adversarial samples has been widely recognized. Most of the existing defense methods only consider the attack influence on image level, while the effect of correlation among feature components has not been investigated. In fact, when one feature component is successfully attacked, its correlated components can be attacked with higher probability. In this paper, a self-supervised disentanglement based defense framework is proposed, providing a general tool to disentangle features by greatly reducing correlation among feature components, thus significantly improving the robustness of the classification network. The proposed framework reveals the important role of disentangled embedding in defending adversarial samples. Extensive experiments on several benchmark datasets validate that the proposed defense framework consistently presents its robustness against extensive adversarial attacks. Also, the proposed model can be applied to any typical defense method as a good promotion strategy.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114438519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506111
Tianpei Lian, Z. Cao, Ke Xian, Zhiyu Pan, Weicai Zhong
Image cropping aims to enhance the aesthetic quality of a given image by removing unwanted areas. Existing image cropping methods can be divided into two groups: candidate-based and candidate-free methods. For candidate-based methods, dense predefined candidate boxes can indeed cover good boxes, but most candidates with low aesthetic quality may disturb the following judgment and lead to an undesirable result. For candidate-free methods, the cropping box is directly acquired according to certain prior knowledge. However, the effect of only one box is not stable enough due to the subjectivity of image cropping. In order to combine the advantages of the above methods and overcome these shortcomings, we need fewer but more representative candidate boxes. To this end, we propose FCRNet, a fully convolutional regression network, which predicts several context-aware cropping boxes in an ensemble manner as candidates. A multi-task loss is employed to supervise the generation of candidates. Unlike previous candidate-based works, FCRNet outputs a small number of context-aware candidates without any predefined box and the final result is selected from these candidates by an aesthetic evaluation network or even manual selection. Extensive experiments show the superiority of our context-aware candidates based method over the state-of-the-art approaches.
{"title":"Context-Aware Candidates for Image Cropping","authors":"Tianpei Lian, Z. Cao, Ke Xian, Zhiyu Pan, Weicai Zhong","doi":"10.1109/ICIP42928.2021.9506111","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506111","url":null,"abstract":"Image cropping aims to enhance the aesthetic quality of a given image by removing unwanted areas. Existing image cropping methods can be divided into two groups: candidate-based and candidate-free methods. For candidate-based methods, dense predefined candidate boxes can indeed cover good boxes, but most candidates with low aesthetic quality may disturb the following judgment and lead to an undesirable result. For candidate-free methods, the cropping box is directly acquired according to certain prior knowledge. However, the effect of only one box is not stable enough due to the subjectivity of image cropping. In order to combine the advantages of the above methods and overcome these shortcomings, we need fewer but more representative candidate boxes. To this end, we propose FCRNet, a fully convolutional regression network, which predicts several context-aware cropping boxes in an ensemble manner as candidates. A multi-task loss is employed to supervise the generation of candidates. Unlike previous candidate-based works, FCRNet outputs a small number of context-aware candidates without any predefined box and the final result is selected from these candidates by an aesthetic evaluation network or even manual selection. Extensive experiments show the superiority of our context-aware candidates based method over the state-of-the-art approaches.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114453392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506306
Rongxiao Tang, Shuang Sun, Feng Liu, Zhenhua Guo
Fingerprint recognition has been used for person identification for centuries, and fingerprint features are divided into three levels. The level 3 feature is the fingerprint pore, which can be used to improve the performance of the automatic fingerprint recognition performance and to prevent spoofing in high-resolution fingerprints. Therefore, the accurate extraction of fingerprint pores is quite important. With the development of convolutional neural networks (CNNs), researchers have made great progress in fingerprint feature extraction. However, these supervised-based methods require manually labelled pores to train the network, and labelling pores is very tedious and time consuming because there are hundreds of pores in one fingerprint. In this paper, we design a weakly supervised pore extraction method that avoids manual label processing and trains the network with a noisy label. This method can achieve results comparable with a supervised CNN-based method.
{"title":"Weakly Supervised Fingerprint Pore Extraction With Convolutional Neural Network","authors":"Rongxiao Tang, Shuang Sun, Feng Liu, Zhenhua Guo","doi":"10.1109/ICIP42928.2021.9506306","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506306","url":null,"abstract":"Fingerprint recognition has been used for person identification for centuries, and fingerprint features are divided into three levels. The level 3 feature is the fingerprint pore, which can be used to improve the performance of the automatic fingerprint recognition performance and to prevent spoofing in high-resolution fingerprints. Therefore, the accurate extraction of fingerprint pores is quite important. With the development of convolutional neural networks (CNNs), researchers have made great progress in fingerprint feature extraction. However, these supervised-based methods require manually labelled pores to train the network, and labelling pores is very tedious and time consuming because there are hundreds of pores in one fingerprint. In this paper, we design a weakly supervised pore extraction method that avoids manual label processing and trains the network with a noisy label. This method can achieve results comparable with a supervised CNN-based method.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121869440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506308
Wenyu Sun, Jiyang Xie, Jiayan Qiu, Zhanyu Ma
Due to the large amount of noisy data in person re-identification (ReID) task, the ReID models are usually affected by the data uncertainty. Therefore, the deep uncertainty estimation method is important for improving the model robustness and matching accuracy. To this end, we propose a part-based uncertainty convolutional neural network (PUCNN), which introduces the part-based uncertainty estimation into the baseline model. On the one hand, PUCNN improves the model robustness to noisy data by distributilizing the feature embedding and constraining the part-based uncertainty. On the other hand, PUCNN improves the cumulative matching characteristics (CMC) performance of the model by filtering out low-quality training samples according to the estimated uncertainty score. The experiments on both non-video datasets, the noised Market-1501 and DukeMTMC, and video datasets, PRID2011, iLiDS-VID and MARS, demonstrate that our proposed method achieves encouraging and promising performance.
{"title":"Part Uncertainty Estimation Convolutional Neural Network For Person Re-Identification","authors":"Wenyu Sun, Jiyang Xie, Jiayan Qiu, Zhanyu Ma","doi":"10.1109/ICIP42928.2021.9506308","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506308","url":null,"abstract":"Due to the large amount of noisy data in person re-identification (ReID) task, the ReID models are usually affected by the data uncertainty. Therefore, the deep uncertainty estimation method is important for improving the model robustness and matching accuracy. To this end, we propose a part-based uncertainty convolutional neural network (PUCNN), which introduces the part-based uncertainty estimation into the baseline model. On the one hand, PUCNN improves the model robustness to noisy data by distributilizing the feature embedding and constraining the part-based uncertainty. On the other hand, PUCNN improves the cumulative matching characteristics (CMC) performance of the model by filtering out low-quality training samples according to the estimated uncertainty score. The experiments on both non-video datasets, the noised Market-1501 and DukeMTMC, and video datasets, PRID2011, iLiDS-VID and MARS, demonstrate that our proposed method achieves encouraging and promising performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122004004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506788
Jiawei Ma, Xiaoyu Tao, Jianxing Ma, Xiaopeng Hong, Yihong Gong
Class Incremental Learning (CIL) is a hot topic in machine learning for CNN models to learn new classes incrementally. However, most of the CIL studies are for image classification and object recognition tasks and few CIL studies are available for video action classification. To mitigate this problem, in this paper, we present a new Grow When Required network (GWR) based video CIL framework for action classification. GWR learns knowledge incrementally by modeling the manifold of video frames for each encountered action class in feature space. We also introduce a Knowledge Consolidation (KC) method to separate the feature manifolds of old class and new class and introduce an associative matrix for label prediction. Experimental results on KTH and Weizmann demonstrate the effectiveness of the framework.
{"title":"Class Incremental Learning for Video Action Classification","authors":"Jiawei Ma, Xiaoyu Tao, Jianxing Ma, Xiaopeng Hong, Yihong Gong","doi":"10.1109/ICIP42928.2021.9506788","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506788","url":null,"abstract":"Class Incremental Learning (CIL) is a hot topic in machine learning for CNN models to learn new classes incrementally. However, most of the CIL studies are for image classification and object recognition tasks and few CIL studies are available for video action classification. To mitigate this problem, in this paper, we present a new Grow When Required network (GWR) based video CIL framework for action classification. GWR learns knowledge incrementally by modeling the manifold of video frames for each encountered action class in feature space. We also introduce a Knowledge Consolidation (KC) method to separate the feature manifolds of old class and new class and introduce an associative matrix for label prediction. Experimental results on KTH and Weizmann demonstrate the effectiveness of the framework.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122153541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506135
Yue Yin, C. McCarthy, Dana Rezazadegan
Interest in 3D hand pose estimation is rapidly growing, offering the potential for real-time hand gesture recognition in a range of interactive VR/AR applications, and beyond. Most current 3D hand pose estimation models rely on dedicated depth-sensing cameras and/or specialised hardware support to handle both the high computation and memory requirements. However, such requirements hinder the practical application of such models on mobile devices or in other embedded computing contexts. To address this, we propose a lightweight model for hand and object pose estimation specifically targeting mobile applications. Using RGB images only, we show how our approach achieves real-time performance, comparable accuracy, and an 81% model size reduction compared with state-of-the-art methods, thereby supporting the feasibility of the model for deployment on mobile platforms.
{"title":"Real-Time 3D Hand-Object Pose Estimation for Mobile Devices","authors":"Yue Yin, C. McCarthy, Dana Rezazadegan","doi":"10.1109/ICIP42928.2021.9506135","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506135","url":null,"abstract":"Interest in 3D hand pose estimation is rapidly growing, offering the potential for real-time hand gesture recognition in a range of interactive VR/AR applications, and beyond. Most current 3D hand pose estimation models rely on dedicated depth-sensing cameras and/or specialised hardware support to handle both the high computation and memory requirements. However, such requirements hinder the practical application of such models on mobile devices or in other embedded computing contexts. To address this, we propose a lightweight model for hand and object pose estimation specifically targeting mobile applications. Using RGB images only, we show how our approach achieves real-time performance, comparable accuracy, and an 81% model size reduction compared with state-of-the-art methods, thereby supporting the feasibility of the model for deployment on mobile platforms.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129833512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506178
Hanhe Lin, Guangan Chen, F. Siebert
Recent advances in the automated detection of motorcycle riders’ helmet use have enabled road safety actors to process large scale video data efficiently and with high accuracy. To distinguish drivers from passengers in helmet use, the most straightforward way is to train a multi-class classifier, where each class corresponds to a specific combination of rider position and individual riders’ helmet use. However, such strategy results in long-tailed data distribution, with critically low class samples for a number of uncommon classes. In this paper, we propose a novel approach to address this limitation. Let n be the maximum number of riders a motorcycle can hold, we encode the helmet use on a motorcycle as a vector with 2n bits, where the first n bits denote if the encoded positions have riders, and the latter n bits denote if the rider in the corresponding position wears a helmet. With the novel helmet use positional encoding, we propose a deep learning model that stands on existing image classification architecture. The model simultaneously trains 2n binary classifiers, which allows more balanced samples for training. This method is simple to implement and requires no hyperparameter tuning. Experimental results demonstrate our approach outperforms the state-of-the-art approaches by 1.9% accuracy.
{"title":"Positional Encoding: Improving Class-Imbalanced Motorcycle Helmet use Classification","authors":"Hanhe Lin, Guangan Chen, F. Siebert","doi":"10.1109/ICIP42928.2021.9506178","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506178","url":null,"abstract":"Recent advances in the automated detection of motorcycle riders’ helmet use have enabled road safety actors to process large scale video data efficiently and with high accuracy. To distinguish drivers from passengers in helmet use, the most straightforward way is to train a multi-class classifier, where each class corresponds to a specific combination of rider position and individual riders’ helmet use. However, such strategy results in long-tailed data distribution, with critically low class samples for a number of uncommon classes. In this paper, we propose a novel approach to address this limitation. Let n be the maximum number of riders a motorcycle can hold, we encode the helmet use on a motorcycle as a vector with 2n bits, where the first n bits denote if the encoded positions have riders, and the latter n bits denote if the rider in the corresponding position wears a helmet. With the novel helmet use positional encoding, we propose a deep learning model that stands on existing image classification architecture. The model simultaneously trains 2n binary classifiers, which allows more balanced samples for training. This method is simple to implement and requires no hyperparameter tuning. Experimental results demonstrate our approach outperforms the state-of-the-art approaches by 1.9% accuracy.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128215446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506032
Y. Jin, T. Goodall, Anjul Patney, R. Webb, A. Bovik
In Virtual Reality (VR) systems, head mounted displays (HMDs) are widely used to present VR contents. When displaying immersive (360° video) scenes, greater challenges arise due to limitations of computing power, frame rate, and transmission bandwidth. To address these problems, a variety of foveated video compression and streaming methods have been proposed, which seek to exploit the nonuniform sampling density of the retinal photoreceptors and ganglion cells, which decreases rapidly with increasing eccentricity. Creating foveated immersive video content leads to the need for specialized foveated video quality pridictors. Here we propose a No-Reference (NR or blind) method which we call “Space-Variant BRISQUE (SV-BRISQUE),” which is based on a new space-variant natural scene statistics model. When tested on a large database of foveated, compression-distorted videos along with human opinions of them, we found that our new model algorithm achieves state of the art (SOTA) performance with correlation 0.88 / 0.90 (PLCC / SROCC) against human subjectivity.
{"title":"A Foveated Video Quality Assessment Model Using Space-Variant Natural Scene Statistics","authors":"Y. Jin, T. Goodall, Anjul Patney, R. Webb, A. Bovik","doi":"10.1109/ICIP42928.2021.9506032","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506032","url":null,"abstract":"In Virtual Reality (VR) systems, head mounted displays (HMDs) are widely used to present VR contents. When displaying immersive (360° video) scenes, greater challenges arise due to limitations of computing power, frame rate, and transmission bandwidth. To address these problems, a variety of foveated video compression and streaming methods have been proposed, which seek to exploit the nonuniform sampling density of the retinal photoreceptors and ganglion cells, which decreases rapidly with increasing eccentricity. Creating foveated immersive video content leads to the need for specialized foveated video quality pridictors. Here we propose a No-Reference (NR or blind) method which we call “Space-Variant BRISQUE (SV-BRISQUE),” which is based on a new space-variant natural scene statistics model. When tested on a large database of foveated, compression-distorted videos along with human opinions of them, we found that our new model algorithm achieves state of the art (SOTA) performance with correlation 0.88 / 0.90 (PLCC / SROCC) against human subjectivity.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}