首页 > 最新文献

2021 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Liver Tumor Detection Via A Multi-Scale Intermediate Multi-Modal Fusion Network on MRI Images 基于MRI图像多尺度中间多模态融合网络的肝脏肿瘤检测
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506237
Chao Pan, Peiyun Zhou, Jingru Tan, Bao-Ye Sun, Ruo-Yu Guan, Zhutao Wang, Ye Luo, Jianwei Lu
Automatic liver tumor detection can assist doctors to make effective treatments. However, how to utilize multi-modal images to improve detection performance is still challenging. Common solutions for using multi-modal images consist of early, inter-layer, and late fusion. They either do not fully consider the intermediate multi-modal feature interaction or have not put their focus on tumor detection. In this paper, we propose a novel multi-scale intermediate multi-modal fusion detection framework to achieve multi-modal liver tumor detection. Unlike early or late fusion, it maintains two branches of different modal information and introduces cross-modal feature interaction progressively, thus better leveraging the complementary information contained in multi-modalities. To further enhance the multi-modal context at all scales, we design a multi-modal enhanced feature pyramid. Extensive experiments on the collected liver tumor magnetic resonance imaging (MRI) dataset show that our framework outperforms other state-of-the-art detection approaches in the case of using multi-modal images.
肝脏肿瘤自动检测,辅助医生进行有效治疗。然而,如何利用多模态图像来提高检测性能仍然是一个挑战。使用多模态图像的常用解决方案包括早期融合、层间融合和后期融合。他们要么没有充分考虑中间多模态特征的相互作用,要么没有把重点放在肿瘤检测上。本文提出一种新型的多尺度中间多模态融合检测框架,实现肝脏肿瘤的多模态检测。与早期或晚期融合不同,它保留了不同模态信息的两个分支,并逐步引入跨模态特征交互,从而更好地利用了多模态中包含的互补信息。为了进一步增强所有尺度上的多模态上下文,我们设计了一个多模态增强特征金字塔。在收集的肝肿瘤磁共振成像(MRI)数据集上进行的大量实验表明,在使用多模态图像的情况下,我们的框架优于其他最先进的检测方法。
{"title":"Liver Tumor Detection Via A Multi-Scale Intermediate Multi-Modal Fusion Network on MRI Images","authors":"Chao Pan, Peiyun Zhou, Jingru Tan, Bao-Ye Sun, Ruo-Yu Guan, Zhutao Wang, Ye Luo, Jianwei Lu","doi":"10.1109/ICIP42928.2021.9506237","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506237","url":null,"abstract":"Automatic liver tumor detection can assist doctors to make effective treatments. However, how to utilize multi-modal images to improve detection performance is still challenging. Common solutions for using multi-modal images consist of early, inter-layer, and late fusion. They either do not fully consider the intermediate multi-modal feature interaction or have not put their focus on tumor detection. In this paper, we propose a novel multi-scale intermediate multi-modal fusion detection framework to achieve multi-modal liver tumor detection. Unlike early or late fusion, it maintains two branches of different modal information and introduces cross-modal feature interaction progressively, thus better leveraging the complementary information contained in multi-modalities. To further enhance the multi-modal context at all scales, we design a multi-modal enhanced feature pyramid. Extensive experiments on the collected liver tumor magnetic resonance imaging (MRI) dataset show that our framework outperforms other state-of-the-art detection approaches in the case of using multi-modal images.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122551476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Online Weight Pruning Via Adaptive Sparsity Loss 基于自适应稀疏度损失的在线权值修剪
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506301
George Retsinas, Athena Elafrou, G. Goumas, P. Maragos
Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust sparsity controlling framework that efficiently prunes network parameters during training with minimal computational overhead. We incorporate fast mechanisms to prune individual layers and build upon these to automatically prune the entire network under a user-defined budget constraint. Key to our end-to-end network pruning approach is the formulation of an intuitive and easy-to-implement adaptive sparsity loss used to explicitly control sparsity during training, enabling efficient budget-aware optimization.
近年来,作为一种压缩最先进的深度神经网络并使其能够在资源受限的设备上部署的手段,修剪神经网络重新引起了人们的兴趣。在本文中,我们提出了一个鲁棒的稀疏性控制框架,在训练过程中以最小的计算开销有效地修剪网络参数。我们采用快速机制来修剪单个层,并在这些机制的基础上,在用户定义的预算约束下自动修剪整个网络。我们的端到端网络修剪方法的关键是制定一个直观且易于实现的自适应稀疏性损失,用于在训练期间显式控制稀疏性,从而实现有效的预算感知优化。
{"title":"Online Weight Pruning Via Adaptive Sparsity Loss","authors":"George Retsinas, Athena Elafrou, G. Goumas, P. Maragos","doi":"10.1109/ICIP42928.2021.9506301","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506301","url":null,"abstract":"Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust sparsity controlling framework that efficiently prunes network parameters during training with minimal computational overhead. We incorporate fast mechanisms to prune individual layers and build upon these to automatically prune the entire network under a user-defined budget constraint. Key to our end-to-end network pruning approach is the formulation of an intuitive and easy-to-implement adaptive sparsity loss used to explicitly control sparsity during training, enabling efficient budget-aware optimization.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122593127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Self-Supervised Disentangled Embedding For Robust Image Classification 鲁棒图像分类的自监督解纠缠嵌入
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506493
Lanqi Liu, Zhenyu Duan, Guozheng Xu, Yi Xu
Recently, the security of deep learning algorithms against adversarial samples has been widely recognized. Most of the existing defense methods only consider the attack influence on image level, while the effect of correlation among feature components has not been investigated. In fact, when one feature component is successfully attacked, its correlated components can be attacked with higher probability. In this paper, a self-supervised disentanglement based defense framework is proposed, providing a general tool to disentangle features by greatly reducing correlation among feature components, thus significantly improving the robustness of the classification network. The proposed framework reveals the important role of disentangled embedding in defending adversarial samples. Extensive experiments on several benchmark datasets validate that the proposed defense framework consistently presents its robustness against extensive adversarial attacks. Also, the proposed model can be applied to any typical defense method as a good promotion strategy.
最近,深度学习算法对抗对抗样本的安全性得到了广泛的认可。现有的防御方法大多只考虑攻击对图像层面的影响,而没有研究特征分量之间相关性的影响。实际上,当一个特征分量被攻击成功时,其相关分量被攻击的概率就会更高。本文提出了一种基于自监督解缠的防御框架,通过大大降低特征成分之间的相关性,为解缠特征提供了一种通用工具,从而显著提高了分类网络的鲁棒性。该框架揭示了解纠缠嵌入在对抗样本防御中的重要作用。在多个基准数据集上进行的大量实验验证了所提出的防御框架对广泛的对抗性攻击始终具有鲁棒性。同时,该模型可以应用于任何典型的防御方法,作为一种良好的推广策略。
{"title":"Self-Supervised Disentangled Embedding For Robust Image Classification","authors":"Lanqi Liu, Zhenyu Duan, Guozheng Xu, Yi Xu","doi":"10.1109/ICIP42928.2021.9506493","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506493","url":null,"abstract":"Recently, the security of deep learning algorithms against adversarial samples has been widely recognized. Most of the existing defense methods only consider the attack influence on image level, while the effect of correlation among feature components has not been investigated. In fact, when one feature component is successfully attacked, its correlated components can be attacked with higher probability. In this paper, a self-supervised disentanglement based defense framework is proposed, providing a general tool to disentangle features by greatly reducing correlation among feature components, thus significantly improving the robustness of the classification network. The proposed framework reveals the important role of disentangled embedding in defending adversarial samples. Extensive experiments on several benchmark datasets validate that the proposed defense framework consistently presents its robustness against extensive adversarial attacks. Also, the proposed model can be applied to any typical defense method as a good promotion strategy.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114438519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-Aware Candidates for Image Cropping 上下文感知的候选图像裁剪
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506111
Tianpei Lian, Z. Cao, Ke Xian, Zhiyu Pan, Weicai Zhong
Image cropping aims to enhance the aesthetic quality of a given image by removing unwanted areas. Existing image cropping methods can be divided into two groups: candidate-based and candidate-free methods. For candidate-based methods, dense predefined candidate boxes can indeed cover good boxes, but most candidates with low aesthetic quality may disturb the following judgment and lead to an undesirable result. For candidate-free methods, the cropping box is directly acquired according to certain prior knowledge. However, the effect of only one box is not stable enough due to the subjectivity of image cropping. In order to combine the advantages of the above methods and overcome these shortcomings, we need fewer but more representative candidate boxes. To this end, we propose FCRNet, a fully convolutional regression network, which predicts several context-aware cropping boxes in an ensemble manner as candidates. A multi-task loss is employed to supervise the generation of candidates. Unlike previous candidate-based works, FCRNet outputs a small number of context-aware candidates without any predefined box and the final result is selected from these candidates by an aesthetic evaluation network or even manual selection. Extensive experiments show the superiority of our context-aware candidates based method over the state-of-the-art approaches.
图像裁剪旨在通过去除不需要的区域来增强给定图像的美学质量。现有的图像裁剪方法可分为基于候选点的和无候选点的两大类。对于基于候选的方法,密集的预定义候选框确实可以覆盖好的候选框,但是大多数审美质量较低的候选框可能会干扰后续的判断,导致不理想的结果。对于无候选者方法,根据一定的先验知识直接获取裁剪框。但由于图像裁剪的主观性,单框的效果不够稳定。为了结合以上方法的优点,克服这些缺点,我们需要更少但更有代表性的候选框。为此,我们提出了FCRNet,一个全卷积回归网络,它以集成的方式预测几个上下文感知裁剪框作为候选。采用多任务损失来监督候选人的生成。与之前基于候选对象的作品不同,FCRNet输出少量上下文感知的候选对象,没有任何预定义的框,最终结果由审美评估网络甚至人工选择从这些候选对象中选出。大量的实验表明,我们基于上下文感知的候选方法优于最先进的方法。
{"title":"Context-Aware Candidates for Image Cropping","authors":"Tianpei Lian, Z. Cao, Ke Xian, Zhiyu Pan, Weicai Zhong","doi":"10.1109/ICIP42928.2021.9506111","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506111","url":null,"abstract":"Image cropping aims to enhance the aesthetic quality of a given image by removing unwanted areas. Existing image cropping methods can be divided into two groups: candidate-based and candidate-free methods. For candidate-based methods, dense predefined candidate boxes can indeed cover good boxes, but most candidates with low aesthetic quality may disturb the following judgment and lead to an undesirable result. For candidate-free methods, the cropping box is directly acquired according to certain prior knowledge. However, the effect of only one box is not stable enough due to the subjectivity of image cropping. In order to combine the advantages of the above methods and overcome these shortcomings, we need fewer but more representative candidate boxes. To this end, we propose FCRNet, a fully convolutional regression network, which predicts several context-aware cropping boxes in an ensemble manner as candidates. A multi-task loss is employed to supervise the generation of candidates. Unlike previous candidate-based works, FCRNet outputs a small number of context-aware candidates without any predefined box and the final result is selected from these candidates by an aesthetic evaluation network or even manual selection. Extensive experiments show the superiority of our context-aware candidates based method over the state-of-the-art approaches.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114453392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Weakly Supervised Fingerprint Pore Extraction With Convolutional Neural Network 基于卷积神经网络的弱监督指纹孔提取
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506306
Rongxiao Tang, Shuang Sun, Feng Liu, Zhenhua Guo
Fingerprint recognition has been used for person identification for centuries, and fingerprint features are divided into three levels. The level 3 feature is the fingerprint pore, which can be used to improve the performance of the automatic fingerprint recognition performance and to prevent spoofing in high-resolution fingerprints. Therefore, the accurate extraction of fingerprint pores is quite important. With the development of convolutional neural networks (CNNs), researchers have made great progress in fingerprint feature extraction. However, these supervised-based methods require manually labelled pores to train the network, and labelling pores is very tedious and time consuming because there are hundreds of pores in one fingerprint. In this paper, we design a weakly supervised pore extraction method that avoids manual label processing and trains the network with a noisy label. This method can achieve results comparable with a supervised CNN-based method.
指纹识别用于人的身份识别已有几个世纪的历史,指纹特征分为三个层次。第3级特征为指纹孔,可用于提高指纹自动识别性能,防止高分辨率指纹的欺骗。因此,准确提取指纹孔隙是十分重要的。随着卷积神经网络(cnn)的发展,研究人员在指纹特征提取方面取得了很大的进展。然而,这些基于监督的方法需要手动标记孔隙来训练网络,并且标记孔隙非常繁琐且耗时,因为一个指纹中有数百个孔隙。在本文中,我们设计了一种弱监督的孔隙提取方法,该方法避免了人工标签处理,并使用带噪声的标签来训练网络。该方法可以获得与基于监督cnn的方法相当的结果。
{"title":"Weakly Supervised Fingerprint Pore Extraction With Convolutional Neural Network","authors":"Rongxiao Tang, Shuang Sun, Feng Liu, Zhenhua Guo","doi":"10.1109/ICIP42928.2021.9506306","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506306","url":null,"abstract":"Fingerprint recognition has been used for person identification for centuries, and fingerprint features are divided into three levels. The level 3 feature is the fingerprint pore, which can be used to improve the performance of the automatic fingerprint recognition performance and to prevent spoofing in high-resolution fingerprints. Therefore, the accurate extraction of fingerprint pores is quite important. With the development of convolutional neural networks (CNNs), researchers have made great progress in fingerprint feature extraction. However, these supervised-based methods require manually labelled pores to train the network, and labelling pores is very tedious and time consuming because there are hundreds of pores in one fingerprint. In this paper, we design a weakly supervised pore extraction method that avoids manual label processing and trains the network with a noisy label. This method can achieve results comparable with a supervised CNN-based method.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121869440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Part Uncertainty Estimation Convolutional Neural Network For Person Re-Identification 部分不确定性估计卷积神经网络中的人物再识别
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506308
Wenyu Sun, Jiyang Xie, Jiayan Qiu, Zhanyu Ma
Due to the large amount of noisy data in person re-identification (ReID) task, the ReID models are usually affected by the data uncertainty. Therefore, the deep uncertainty estimation method is important for improving the model robustness and matching accuracy. To this end, we propose a part-based uncertainty convolutional neural network (PUCNN), which introduces the part-based uncertainty estimation into the baseline model. On the one hand, PUCNN improves the model robustness to noisy data by distributilizing the feature embedding and constraining the part-based uncertainty. On the other hand, PUCNN improves the cumulative matching characteristics (CMC) performance of the model by filtering out low-quality training samples according to the estimated uncertainty score. The experiments on both non-video datasets, the noised Market-1501 and DukeMTMC, and video datasets, PRID2011, iLiDS-VID and MARS, demonstrate that our proposed method achieves encouraging and promising performance.
由于人员再识别任务中存在大量的噪声数据,其模型通常会受到数据不确定性的影响。因此,深度不确定性估计方法对于提高模型的鲁棒性和匹配精度具有重要意义。为此,我们提出了一种基于部分的不确定性卷积神经网络(PUCNN),将基于部分的不确定性估计引入到基线模型中。一方面,PUCNN通过分布特征嵌入和约束基于部件的不确定性,提高了模型对噪声数据的鲁棒性。另一方面,PUCNN通过根据估计的不确定性分数过滤掉低质量的训练样本,提高了模型的累积匹配特征(CMC)性能。在非视频数据集Market-1501和DukeMTMC以及视频数据集PRID2011、iLiDS-VID和MARS上的实验表明,我们提出的方法取得了令人鼓舞和有希望的性能。
{"title":"Part Uncertainty Estimation Convolutional Neural Network For Person Re-Identification","authors":"Wenyu Sun, Jiyang Xie, Jiayan Qiu, Zhanyu Ma","doi":"10.1109/ICIP42928.2021.9506308","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506308","url":null,"abstract":"Due to the large amount of noisy data in person re-identification (ReID) task, the ReID models are usually affected by the data uncertainty. Therefore, the deep uncertainty estimation method is important for improving the model robustness and matching accuracy. To this end, we propose a part-based uncertainty convolutional neural network (PUCNN), which introduces the part-based uncertainty estimation into the baseline model. On the one hand, PUCNN improves the model robustness to noisy data by distributilizing the feature embedding and constraining the part-based uncertainty. On the other hand, PUCNN improves the cumulative matching characteristics (CMC) performance of the model by filtering out low-quality training samples according to the estimated uncertainty score. The experiments on both non-video datasets, the noised Market-1501 and DukeMTMC, and video datasets, PRID2011, iLiDS-VID and MARS, demonstrate that our proposed method achieves encouraging and promising performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122004004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Class Incremental Learning for Video Action Classification 视频动作分类的类增量学习
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506788
Jiawei Ma, Xiaoyu Tao, Jianxing Ma, Xiaopeng Hong, Yihong Gong
Class Incremental Learning (CIL) is a hot topic in machine learning for CNN models to learn new classes incrementally. However, most of the CIL studies are for image classification and object recognition tasks and few CIL studies are available for video action classification. To mitigate this problem, in this paper, we present a new Grow When Required network (GWR) based video CIL framework for action classification. GWR learns knowledge incrementally by modeling the manifold of video frames for each encountered action class in feature space. We also introduce a Knowledge Consolidation (KC) method to separate the feature manifolds of old class and new class and introduce an associative matrix for label prediction. Experimental results on KTH and Weizmann demonstrate the effectiveness of the framework.
类增量学习(Class Incremental Learning, CIL)是CNN模型增量学习新类的一个热点。然而,大多数的CIL研究都是针对图像分类和目标识别任务,很少有针对视频动作分类的CIL研究。为了解决这一问题,本文提出了一种新的基于GWR网络的视频CIL动作分类框架。GWR通过在特征空间中为每个遇到的动作类建模视频帧的流形来增量地学习知识。引入知识整合(Knowledge Consolidation, KC)方法分离新旧类的特征流形,并引入关联矩阵进行标签预测。KTH和Weizmann的实验结果证明了该框架的有效性。
{"title":"Class Incremental Learning for Video Action Classification","authors":"Jiawei Ma, Xiaoyu Tao, Jianxing Ma, Xiaopeng Hong, Yihong Gong","doi":"10.1109/ICIP42928.2021.9506788","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506788","url":null,"abstract":"Class Incremental Learning (CIL) is a hot topic in machine learning for CNN models to learn new classes incrementally. However, most of the CIL studies are for image classification and object recognition tasks and few CIL studies are available for video action classification. To mitigate this problem, in this paper, we present a new Grow When Required network (GWR) based video CIL framework for action classification. GWR learns knowledge incrementally by modeling the manifold of video frames for each encountered action class in feature space. We also introduce a Knowledge Consolidation (KC) method to separate the feature manifolds of old class and new class and introduce an associative matrix for label prediction. Experimental results on KTH and Weizmann demonstrate the effectiveness of the framework.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122153541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Real-Time 3D Hand-Object Pose Estimation for Mobile Devices 移动设备实时三维手-对象姿态估计
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506135
Yue Yin, C. McCarthy, Dana Rezazadegan
Interest in 3D hand pose estimation is rapidly growing, offering the potential for real-time hand gesture recognition in a range of interactive VR/AR applications, and beyond. Most current 3D hand pose estimation models rely on dedicated depth-sensing cameras and/or specialised hardware support to handle both the high computation and memory requirements. However, such requirements hinder the practical application of such models on mobile devices or in other embedded computing contexts. To address this, we propose a lightweight model for hand and object pose estimation specifically targeting mobile applications. Using RGB images only, we show how our approach achieves real-time performance, comparable accuracy, and an 81% model size reduction compared with state-of-the-art methods, thereby supporting the feasibility of the model for deployment on mobile platforms.
人们对3D手部姿势估计的兴趣正在迅速增长,这为一系列交互式VR/AR应用以及其他应用提供了实时手势识别的潜力。目前大多数3D手姿估计模型依赖于专用的深度感测相机和/或专门的硬件支持来处理高计算和内存要求。然而,这些要求阻碍了这些模型在移动设备或其他嵌入式计算环境中的实际应用。为了解决这个问题,我们提出了一个轻量级的手部和物体姿态估计模型,专门针对移动应用程序。仅使用RGB图像,我们展示了我们的方法如何实现实时性能,相当的准确性,并且与最先进的方法相比,模型尺寸减少了81%,从而支持该模型在移动平台上部署的可行性。
{"title":"Real-Time 3D Hand-Object Pose Estimation for Mobile Devices","authors":"Yue Yin, C. McCarthy, Dana Rezazadegan","doi":"10.1109/ICIP42928.2021.9506135","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506135","url":null,"abstract":"Interest in 3D hand pose estimation is rapidly growing, offering the potential for real-time hand gesture recognition in a range of interactive VR/AR applications, and beyond. Most current 3D hand pose estimation models rely on dedicated depth-sensing cameras and/or specialised hardware support to handle both the high computation and memory requirements. However, such requirements hinder the practical application of such models on mobile devices or in other embedded computing contexts. To address this, we propose a lightweight model for hand and object pose estimation specifically targeting mobile applications. Using RGB images only, we show how our approach achieves real-time performance, comparable accuracy, and an 81% model size reduction compared with state-of-the-art methods, thereby supporting the feasibility of the model for deployment on mobile platforms.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129833512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Positional Encoding: Improving Class-Imbalanced Motorcycle Helmet use Classification 位置编码:改进类别不平衡摩托车头盔使用分类
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506178
Hanhe Lin, Guangan Chen, F. Siebert
Recent advances in the automated detection of motorcycle riders’ helmet use have enabled road safety actors to process large scale video data efficiently and with high accuracy. To distinguish drivers from passengers in helmet use, the most straightforward way is to train a multi-class classifier, where each class corresponds to a specific combination of rider position and individual riders’ helmet use. However, such strategy results in long-tailed data distribution, with critically low class samples for a number of uncommon classes. In this paper, we propose a novel approach to address this limitation. Let n be the maximum number of riders a motorcycle can hold, we encode the helmet use on a motorcycle as a vector with 2n bits, where the first n bits denote if the encoded positions have riders, and the latter n bits denote if the rider in the corresponding position wears a helmet. With the novel helmet use positional encoding, we propose a deep learning model that stands on existing image classification architecture. The model simultaneously trains 2n binary classifiers, which allows more balanced samples for training. This method is simple to implement and requires no hyperparameter tuning. Experimental results demonstrate our approach outperforms the state-of-the-art approaches by 1.9% accuracy.
最近在摩托车骑手头盔使用自动检测方面取得的进展使道路安全行为者能够高效、高精度地处理大规模视频数据。为了区分驾驶员和乘客的头盔使用情况,最直接的方法是训练一个多类别分类器,其中每个类别对应于车手位置和个别车手头盔使用的特定组合。然而,这种策略导致了长尾数据分布,对于许多不常见的类别,其类别样本非常低。在本文中,我们提出了一种新的方法来解决这一限制。设n为摩托车可容纳的最大乘员数,我们将摩托车上的头盔使用情况编码为2n位的向量,其中前n位表示编码位置是否有乘员,后n位表示相应位置的乘员是否戴头盔。基于位置编码的新型头盔,我们提出了一种基于现有图像分类架构的深度学习模型。该模型同时训练2n个二元分类器,使得训练样本更加平衡。该方法易于实现,不需要超参数调优。实验结果表明,我们的方法比最先进的方法准确率高1.9%。
{"title":"Positional Encoding: Improving Class-Imbalanced Motorcycle Helmet use Classification","authors":"Hanhe Lin, Guangan Chen, F. Siebert","doi":"10.1109/ICIP42928.2021.9506178","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506178","url":null,"abstract":"Recent advances in the automated detection of motorcycle riders’ helmet use have enabled road safety actors to process large scale video data efficiently and with high accuracy. To distinguish drivers from passengers in helmet use, the most straightforward way is to train a multi-class classifier, where each class corresponds to a specific combination of rider position and individual riders’ helmet use. However, such strategy results in long-tailed data distribution, with critically low class samples for a number of uncommon classes. In this paper, we propose a novel approach to address this limitation. Let n be the maximum number of riders a motorcycle can hold, we encode the helmet use on a motorcycle as a vector with 2n bits, where the first n bits denote if the encoded positions have riders, and the latter n bits denote if the rider in the corresponding position wears a helmet. With the novel helmet use positional encoding, we propose a deep learning model that stands on existing image classification architecture. The model simultaneously trains 2n binary classifiers, which allows more balanced samples for training. This method is simple to implement and requires no hyperparameter tuning. Experimental results demonstrate our approach outperforms the state-of-the-art approaches by 1.9% accuracy.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128215446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Foveated Video Quality Assessment Model Using Space-Variant Natural Scene Statistics 基于空间变自然场景统计的注视点视频质量评估模型
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506032
Y. Jin, T. Goodall, Anjul Patney, R. Webb, A. Bovik
In Virtual Reality (VR) systems, head mounted displays (HMDs) are widely used to present VR contents. When displaying immersive (360° video) scenes, greater challenges arise due to limitations of computing power, frame rate, and transmission bandwidth. To address these problems, a variety of foveated video compression and streaming methods have been proposed, which seek to exploit the nonuniform sampling density of the retinal photoreceptors and ganglion cells, which decreases rapidly with increasing eccentricity. Creating foveated immersive video content leads to the need for specialized foveated video quality pridictors. Here we propose a No-Reference (NR or blind) method which we call “Space-Variant BRISQUE (SV-BRISQUE),” which is based on a new space-variant natural scene statistics model. When tested on a large database of foveated, compression-distorted videos along with human opinions of them, we found that our new model algorithm achieves state of the art (SOTA) performance with correlation 0.88 / 0.90 (PLCC / SROCC) against human subjectivity.
在虚拟现实(VR)系统中,头戴式显示器(hmd)被广泛用于呈现VR内容。在显示沉浸式(360°视频)场景时,由于计算能力、帧速率和传输带宽的限制,会带来更大的挑战。为了解决这些问题,人们提出了各种注视点视频压缩和流式传输方法,这些方法寻求利用视网膜光感受器和神经节细胞的非均匀采样密度,这种采样密度随着偏心率的增加而迅速降低。创建焦点式沉浸式视频内容需要专门的焦点式视频质量预测器。在此,我们提出了一种基于新的空间变自然场景统计模型的无参考(NR或blind)方法,我们称之为“空间变BRISQUE (SV-BRISQUE)”。当在一个大型数据库中对焦点、压缩失真的视频以及人类对这些视频的看法进行测试时,我们发现我们的新模型算法达到了最先进的SOTA性能,与人类主观性的相关性为0.88 / 0.90 (PLCC / SROCC)。
{"title":"A Foveated Video Quality Assessment Model Using Space-Variant Natural Scene Statistics","authors":"Y. Jin, T. Goodall, Anjul Patney, R. Webb, A. Bovik","doi":"10.1109/ICIP42928.2021.9506032","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506032","url":null,"abstract":"In Virtual Reality (VR) systems, head mounted displays (HMDs) are widely used to present VR contents. When displaying immersive (360° video) scenes, greater challenges arise due to limitations of computing power, frame rate, and transmission bandwidth. To address these problems, a variety of foveated video compression and streaming methods have been proposed, which seek to exploit the nonuniform sampling density of the retinal photoreceptors and ganglion cells, which decreases rapidly with increasing eccentricity. Creating foveated immersive video content leads to the need for specialized foveated video quality pridictors. Here we propose a No-Reference (NR or blind) method which we call “Space-Variant BRISQUE (SV-BRISQUE),” which is based on a new space-variant natural scene statistics model. When tested on a large database of foveated, compression-distorted videos along with human opinions of them, we found that our new model algorithm achieves state of the art (SOTA) performance with correlation 0.88 / 0.90 (PLCC / SROCC) against human subjectivity.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2021 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1