首页 > 最新文献

2021 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
A Tilt-Angle Face Dataset And Its Validation 倾斜人脸数据集及其验证
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506052
Nanxi Wang, Zhongyuan Wang, Zheng He, Baojin Huang, Liguo Zhou, Zhen Han
Since the surveillance cameras are usually mounted at a high position to overlook targets, tilt-angle faces on overhead view are common in the public video surveillance environment. Face recognition approaches based on deep learning models have achieved excellent performance, but there remains a large gap for the overlooking surveillance scenarios. The results of face recognition depend not only on the structure of the model, but also on the completeness and diversity of the training samples. The existing multi-pose face datasets do not cover complete top-view face samples, and the models trained by them thus cannot provide satisfactory accuracy. To this end, this paper pioneers a multi-view tilt-angle face dataset (TFD), which is collected with an elaborately devised overhead capture equipment. TFD contains 11,124 face images from 927 subjects, covering a variety of tilt angles on the overhead view. To verify the validity of the constructed dataset, we further conduct comprehensive face detection and recognition experiments using the corresponding models trained by WiderFace, Webface and our TFD, respectively. Experimental results show that our TFD substantially promotes the face detection and recognition accuracy under the top-view situation. TFD is available at https://github.com/huang1204510135/D FD.
由于监控摄像机通常安装在较高的位置以俯瞰目标,因此俯视图上的倾斜面在公共视频监控环境中很常见。基于深度学习模型的人脸识别方法已经取得了优异的性能,但在忽略监视场景方面仍有很大的差距。人脸识别的结果不仅取决于模型的结构,还取决于训练样本的完整性和多样性。现有的多姿态人脸数据集没有覆盖完整的俯视图人脸样本,因此它们训练的模型不能提供令人满意的精度。为此,本文首创了一种多视角倾斜人脸数据集(TFD),该数据集由精心设计的架空捕获设备收集。TFD包含来自927名受试者的11,124张人脸图像,覆盖了俯视图的各种倾斜角度。为了验证构建数据集的有效性,我们进一步使用WiderFace、Webface和我们的TFD分别训练的相应模型进行了全面的人脸检测和识别实验。实验结果表明,TFD大大提高了俯视图下人脸检测和识别的准确率。TFD可在https://github.com/huang1204510135/D FD上获得。
{"title":"A Tilt-Angle Face Dataset And Its Validation","authors":"Nanxi Wang, Zhongyuan Wang, Zheng He, Baojin Huang, Liguo Zhou, Zhen Han","doi":"10.1109/ICIP42928.2021.9506052","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506052","url":null,"abstract":"Since the surveillance cameras are usually mounted at a high position to overlook targets, tilt-angle faces on overhead view are common in the public video surveillance environment. Face recognition approaches based on deep learning models have achieved excellent performance, but there remains a large gap for the overlooking surveillance scenarios. The results of face recognition depend not only on the structure of the model, but also on the completeness and diversity of the training samples. The existing multi-pose face datasets do not cover complete top-view face samples, and the models trained by them thus cannot provide satisfactory accuracy. To this end, this paper pioneers a multi-view tilt-angle face dataset (TFD), which is collected with an elaborately devised overhead capture equipment. TFD contains 11,124 face images from 927 subjects, covering a variety of tilt angles on the overhead view. To verify the validity of the constructed dataset, we further conduct comprehensive face detection and recognition experiments using the corresponding models trained by WiderFace, Webface and our TFD, respectively. Experimental results show that our TFD substantially promotes the face detection and recognition accuracy under the top-view situation. TFD is available at https://github.com/huang1204510135/D FD.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117330933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Compressive Covariance Matrix Estimation from a Dual-Dispersive Coded Aperture Spectral Imager 双色散编码孔径光谱成像仪的压缩协方差矩阵估计
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506077
Jonathan Monsalve, M. Márquez, I. Esnaola, H. Arguello
Compressive covariance sampling (CCS) theory aims to recover the covariance matrix (CM) of a signal, instead of the signal itself, from a reduced set of random linear projections. Although several theoretical works demonstrate the CCS theory’s advantages in compressive spectral imaging tasks, a real optical implementation has no been proposed. Therefore, this paper proposes a compressive spectral sensing protocol for the dual-dispersive coded aperture spectral snapshot imager (DD-CASSI) to directly estimate the covariance matrix of the signal. Specifically, we propose a coded aperture design that allows recasting the vector sensing problem into matrix form, which enables to exploit the covariance matrix structure such as positive-semidefiniteness, low-rank, or Toeplitz. Additionally, a low-rank approximation of the image is reconstructed using a Principal Components Analysis (PCA) based method. In order to test the precision of the reconstruction, some spectral signatures of the image are captured with a spectrometer and compared with those obtained in the reconstruction using the covariance matrix. Results show the reconstructed spectrum is accurate with a spectral angle mapper (SAM) of less than 14°. RGB image composites of the spectral image also provide evidence of a correct color reconstruction.
压缩协方差采样(CCS)理论旨在从一组简化的随机线性投影中恢复信号的协方差矩阵(CM),而不是信号本身。尽管一些理论工作证明了CCS理论在压缩光谱成像任务中的优势,但尚未提出真正的光学实现。为此,本文针对双色散编码孔径光谱快照成像仪(DD-CASSI)提出了一种压缩光谱感知协议,直接估计信号的协方差矩阵。具体来说,我们提出了一种编码孔径设计,允许将矢量传感问题重新转换为矩阵形式,从而能够利用协方差矩阵结构,如正半确定、低秩或Toeplitz。此外,使用基于主成分分析(PCA)的方法重建图像的低秩近似。为了检验重建的精度,用光谱仪捕获了图像的一些光谱特征,并利用协方差矩阵与重建得到的光谱特征进行了比较。结果表明,在光谱角度小于14°的情况下,重建光谱精度较高。RGB图像的复合光谱图像也提供了正确的色彩重建证据。
{"title":"Compressive Covariance Matrix Estimation from a Dual-Dispersive Coded Aperture Spectral Imager","authors":"Jonathan Monsalve, M. Márquez, I. Esnaola, H. Arguello","doi":"10.1109/ICIP42928.2021.9506077","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506077","url":null,"abstract":"Compressive covariance sampling (CCS) theory aims to recover the covariance matrix (CM) of a signal, instead of the signal itself, from a reduced set of random linear projections. Although several theoretical works demonstrate the CCS theory’s advantages in compressive spectral imaging tasks, a real optical implementation has no been proposed. Therefore, this paper proposes a compressive spectral sensing protocol for the dual-dispersive coded aperture spectral snapshot imager (DD-CASSI) to directly estimate the covariance matrix of the signal. Specifically, we propose a coded aperture design that allows recasting the vector sensing problem into matrix form, which enables to exploit the covariance matrix structure such as positive-semidefiniteness, low-rank, or Toeplitz. Additionally, a low-rank approximation of the image is reconstructed using a Principal Components Analysis (PCA) based method. In order to test the precision of the reconstruction, some spectral signatures of the image are captured with a spectrometer and compared with those obtained in the reconstruction using the covariance matrix. Results show the reconstructed spectrum is accurate with a spectral angle mapper (SAM) of less than 14°. RGB image composites of the spectral image also provide evidence of a correct color reconstruction.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115989694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Knowledge-Based Reasoning Network For Object Detection 基于知识的物体检测推理网络
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506228
Huigang Zhang, Liuan Wang, Jun Sun
The mainstream object detection algorithms rely on recognizing object instances individually, but do not consider the high-level relationship among objects in context. This will inevitably lead to biased detection results, due to the lack of commonsense knowledge that humans often use to assist the task for object identification. In this paper, we present a novel reasoning module to endow the current detection systems with the power of commonsense knowledge. Specifically, we use graph attention network (GAT) to represent the knowledge among objects. The knowledge covers visual and semantic relations. Through the iterative update of GAT, the object features can be enriched. Experiments on the COCO detection benchmark indicate that our knowledge-based reasoning network has achieved consistent improvements upon various CNN detectors. We achieved 1.9 and 1.8 points higher Average Precision (AP) than Faster-RCNN and Mask-RCNN respectively, when using ResNet50-FPN as backbone.
主流的目标检测算法依赖于单独识别目标实例,而没有考虑上下文中对象之间的高级关系。这将不可避免地导致有偏见的检测结果,由于缺乏常识性的知识,人类经常用来协助任务的对象识别。本文提出了一种新的推理模块,使现有的检测系统具有常识性知识的能力。具体来说,我们使用图注意网络(GAT)来表示对象之间的知识。这些知识涵盖了视觉和语义关系。通过GAT的迭代更新,可以丰富目标特征。在COCO检测基准上的实验表明,我们基于知识的推理网络在各种CNN检测器上取得了一致的改进。当使用ResNet50-FPN作为骨干网时,我们的平均精度(AP)分别比Faster-RCNN和Mask-RCNN高1.9和1.8分。
{"title":"Knowledge-Based Reasoning Network For Object Detection","authors":"Huigang Zhang, Liuan Wang, Jun Sun","doi":"10.1109/ICIP42928.2021.9506228","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506228","url":null,"abstract":"The mainstream object detection algorithms rely on recognizing object instances individually, but do not consider the high-level relationship among objects in context. This will inevitably lead to biased detection results, due to the lack of commonsense knowledge that humans often use to assist the task for object identification. In this paper, we present a novel reasoning module to endow the current detection systems with the power of commonsense knowledge. Specifically, we use graph attention network (GAT) to represent the knowledge among objects. The knowledge covers visual and semantic relations. Through the iterative update of GAT, the object features can be enriched. Experiments on the COCO detection benchmark indicate that our knowledge-based reasoning network has achieved consistent improvements upon various CNN detectors. We achieved 1.9 and 1.8 points higher Average Precision (AP) than Faster-RCNN and Mask-RCNN respectively, when using ResNet50-FPN as backbone.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"493 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123562959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pixel-Wise Failure Prediction For Semantic Video Segmentation 基于像素的语义视频分割故障预测
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506552
Christopher B. Kuhn, M. Hofbauer, Ziqi Xu, G. Petrovic, E. Steinbach
We propose a pixel-accurate failure prediction approach for semantic video segmentation. The proposed scheme improves previously proposed failure prediction methods which so far disregarded the temporal information in videos. Our approach consists of two main steps: First, we train an LSTM-based model to detect spatio-temporal patterns that indicate pixel-wise misclassifications in the current video frame. Second, we use sequences of failure predictions to train a denoising autoencoder that both refines the current failure prediction and predicts future misclassifications. Since public data sets for this scenario are limited, we introduce the large-scale densely annotated video driving (DAVID) data set generated using the CARLA simulator. We evaluate our approach on the real-world Cityscapes data set and the simulator-based DAVID data set. Our experimental results show that spatiotemporal failure prediction outperforms single-image failure prediction by up to 8.8%. Refining the prediction using a sequence of previous failure predictions further improves the performance by a significant 15.2% and allows to accurately predict misclassifications for future frames. While we focus our study on driving videos, the proposed approach is general and can be easily used in other scenarios as well.
提出了一种用于语义视频分割的像素精度故障预测方法。该方法改进了先前提出的不考虑视频时间信息的故障预测方法。我们的方法包括两个主要步骤:首先,我们训练一个基于lstm的模型来检测当前视频帧中显示像素错误分类的时空模式。其次,我们使用故障预测序列来训练去噪自编码器,该编码器既可以改进当前的故障预测,又可以预测未来的错误分类。由于该场景的公共数据集有限,我们引入了使用CARLA模拟器生成的大规模密集注释视频驾驶(DAVID)数据集。我们在真实世界的城市景观数据集和基于模拟器的DAVID数据集上评估了我们的方法。我们的实验结果表明,时空故障预测比单图像故障预测高出8.8%。使用先前的故障预测序列来改进预测,进一步提高了15.2%的性能,并允许准确预测未来帧的错误分类。虽然我们的研究重点是驾驶视频,但所提出的方法是通用的,也可以很容易地用于其他场景。
{"title":"Pixel-Wise Failure Prediction For Semantic Video Segmentation","authors":"Christopher B. Kuhn, M. Hofbauer, Ziqi Xu, G. Petrovic, E. Steinbach","doi":"10.1109/ICIP42928.2021.9506552","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506552","url":null,"abstract":"We propose a pixel-accurate failure prediction approach for semantic video segmentation. The proposed scheme improves previously proposed failure prediction methods which so far disregarded the temporal information in videos. Our approach consists of two main steps: First, we train an LSTM-based model to detect spatio-temporal patterns that indicate pixel-wise misclassifications in the current video frame. Second, we use sequences of failure predictions to train a denoising autoencoder that both refines the current failure prediction and predicts future misclassifications. Since public data sets for this scenario are limited, we introduce the large-scale densely annotated video driving (DAVID) data set generated using the CARLA simulator. We evaluate our approach on the real-world Cityscapes data set and the simulator-based DAVID data set. Our experimental results show that spatiotemporal failure prediction outperforms single-image failure prediction by up to 8.8%. Refining the prediction using a sequence of previous failure predictions further improves the performance by a significant 15.2% and allows to accurately predict misclassifications for future frames. While we focus our study on driving videos, the proposed approach is general and can be easily used in other scenarios as well.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123694197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Progressive Learning With Anchoring Regularization For Vehicle Re-Identification 基于锚定正则化的车辆再识别渐进式学习
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506152
Mohamed Dhia Besbes, Hedi Tabia, Yousri Kessentini, Bassem Ben Hamed
Vehicle re-identification (re-ID) aims to automatically find vehicle identity from a large number of vehicle images captured from multiple cameras. Most existing vehicle re-ID approaches rely on fully supervised learning methodologies, where large amounts of annotated training data are required, which is an expensive task. In this paper, we focus our interest on semi-supervised vehicle re-ID, where each identity has a single labeled and multiple unlabeled samples in the training. We propose a framework which gradually labels vehicle images taken from surveillance cameras. Our framework is based on a deep Convolutional Neural Network (CNN), which is progressively learned using a feature anchoring regularization process. The experiments conducted on various publicly available datasets demonstrate the efficiency of our framework in re-ID tasks. Our approach with only 20% labeled data shows interesting performance compared to the state-of-the-art supervised methods trained on fully labeled data.
车辆再识别(re-ID)旨在从多个摄像头采集的大量车辆图像中自动识别车辆身份。大多数现有的车辆重新识别方法依赖于完全监督的学习方法,其中需要大量带注释的训练数据,这是一项昂贵的任务。在本文中,我们关注的是半监督车辆重新识别,其中每个身份在训练中有单个标记和多个未标记的样本。提出了一种对监控摄像头拍摄的车辆图像进行逐步标记的框架。我们的框架基于深度卷积神经网络(CNN),该网络使用特征锚定正则化过程逐步学习。在各种公开可用的数据集上进行的实验证明了我们的框架在re-ID任务中的效率。与在完全标记数据上训练的最先进的监督方法相比,我们仅使用20%标记数据的方法显示出有趣的性能。
{"title":"Progressive Learning With Anchoring Regularization For Vehicle Re-Identification","authors":"Mohamed Dhia Besbes, Hedi Tabia, Yousri Kessentini, Bassem Ben Hamed","doi":"10.1109/ICIP42928.2021.9506152","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506152","url":null,"abstract":"Vehicle re-identification (re-ID) aims to automatically find vehicle identity from a large number of vehicle images captured from multiple cameras. Most existing vehicle re-ID approaches rely on fully supervised learning methodologies, where large amounts of annotated training data are required, which is an expensive task. In this paper, we focus our interest on semi-supervised vehicle re-ID, where each identity has a single labeled and multiple unlabeled samples in the training. We propose a framework which gradually labels vehicle images taken from surveillance cameras. Our framework is based on a deep Convolutional Neural Network (CNN), which is progressively learned using a feature anchoring regularization process. The experiments conducted on various publicly available datasets demonstrate the efficiency of our framework in re-ID tasks. Our approach with only 20% labeled data shows interesting performance compared to the state-of-the-art supervised methods trained on fully labeled data.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123960386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Block-Based Inter-Frame Prediction For Dynamic Point Cloud Compression 基于块的动态点云压缩帧间预测
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506355
Cristiano Santos, Mateus M. Gonçalves, G. Corrêa, M. Porto
In recent years, 3D point clouds have gained popularity thanks to technological advances such as the increased computational power and the availability of low-cost devices for acquisition of 3D information, like RGBD sensors. However, raw point clouds demand a large amount of data for their representation, and compression is mandatory to allow efficient transmission and storage. Inter-frame prediction is a widely used approach to achieve high compression rates in 2D video encoders, but the current literature still lacks solutions that efficiently exploit temporal redundancy for point cloud encoding. In this work, we propose a novel inter-frame prediction for 3D point cloud compression, which explores temporal redundancies in the 3D space. Moreover, a mode decision algorithm is also proposed to dynamically choose the best encoding mode between inter and intra prediction. The proposed method yields a bitrate reduction of 15.6% and 3.5% for geometry and luma information respectively, with no significant impact in objective quality when compared to the MPEG 3DG solution, called G-PCC.
近年来,由于技术的进步,例如计算能力的提高和用于获取3D信息的低成本设备(如RGBD传感器)的可用性,3D点云越来越受欢迎。然而,原始的点云需要大量的数据来表示,压缩是必须的,以允许有效的传输和存储。帧间预测是在2D视频编码器中实现高压缩率的一种广泛使用的方法,但目前的文献仍然缺乏有效利用点云编码的时间冗余的解决方案。在这项工作中,我们提出了一种新的3D点云压缩帧间预测方法,该方法探索了3D空间中的时间冗余。此外,还提出了一种模式决策算法,在预测间和预测内动态选择最佳编码模式。与MPEG 3DG解决方案(称为G-PCC)相比,该方法的几何和亮度信息的比特率分别降低了15.6%和3.5%,对物镜质量没有显著影响。
{"title":"Block-Based Inter-Frame Prediction For Dynamic Point Cloud Compression","authors":"Cristiano Santos, Mateus M. Gonçalves, G. Corrêa, M. Porto","doi":"10.1109/ICIP42928.2021.9506355","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506355","url":null,"abstract":"In recent years, 3D point clouds have gained popularity thanks to technological advances such as the increased computational power and the availability of low-cost devices for acquisition of 3D information, like RGBD sensors. However, raw point clouds demand a large amount of data for their representation, and compression is mandatory to allow efficient transmission and storage. Inter-frame prediction is a widely used approach to achieve high compression rates in 2D video encoders, but the current literature still lacks solutions that efficiently exploit temporal redundancy for point cloud encoding. In this work, we propose a novel inter-frame prediction for 3D point cloud compression, which explores temporal redundancies in the 3D space. Moreover, a mode decision algorithm is also proposed to dynamically choose the best encoding mode between inter and intra prediction. The proposed method yields a bitrate reduction of 15.6% and 3.5% for geometry and luma information respectively, with no significant impact in objective quality when compared to the MPEG 3DG solution, called G-PCC.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125772850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
S2D2Net: An Improved Approach For Robust Steel Surface Defects Diagnosis With Small Sample Learning 基于小样本学习的鲁棒钢表面缺陷诊断方法
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506405
Vikanksh Nath, C. Chattopadhyay
Surface defect recognition of products is a necessary process to guarantee the quality of industrial production. This paper proposes a hybrid model, S2D2Net (Steel Surface Defect Diagnosis Network), for an efficient and robust inspection of the steel surface during the manufacturing process. The S2D2Net uses a pretrained ImageNet model as a feature extractor and learns a Capsule Network over the extracted features. The experimental results on a publicly available steel surface defect dataset (NEU) show that S2D2Net achieved 99.17% accuracy with minimal training data and improved by 9.59% over its closest competitor based on GAN. S2D2Net proved its robustness by achieving 94.7% accuracy on a diversity enhanced dataset, ENEU, and improved by 3.6% over its closest competitor. It has better, robust recognition performance compared to other state-of-the-art DNN-based detectors.
产品表面缺陷识别是保证工业生产质量的必要过程。本文提出了一种混合模型S2D2Net(钢材表面缺陷诊断网络),用于在制造过程中对钢材表面进行高效、稳健的检测。S2D2Net使用预训练的ImageNet模型作为特征提取器,并在提取的特征上学习Capsule Network。在公开可用的钢表面缺陷数据集(NEU)上的实验结果表明,S2D2Net在最少的训练数据下达到了99.17%的准确率,比基于GAN的最接近的竞争对手提高了9.59%。S2D2Net在多样性增强数据集ENEU上的准确率达到了94.7%,比最接近的竞争对手提高了3.6%,证明了其稳健性。与其他最先进的基于dnn的检测器相比,它具有更好的鲁棒识别性能。
{"title":"S2D2Net: An Improved Approach For Robust Steel Surface Defects Diagnosis With Small Sample Learning","authors":"Vikanksh Nath, C. Chattopadhyay","doi":"10.1109/ICIP42928.2021.9506405","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506405","url":null,"abstract":"Surface defect recognition of products is a necessary process to guarantee the quality of industrial production. This paper proposes a hybrid model, S2D2Net (Steel Surface Defect Diagnosis Network), for an efficient and robust inspection of the steel surface during the manufacturing process. The S2D2Net uses a pretrained ImageNet model as a feature extractor and learns a Capsule Network over the extracted features. The experimental results on a publicly available steel surface defect dataset (NEU) show that S2D2Net achieved 99.17% accuracy with minimal training data and improved by 9.59% over its closest competitor based on GAN. S2D2Net proved its robustness by achieving 94.7% accuracy on a diversity enhanced dataset, ENEU, and improved by 3.6% over its closest competitor. It has better, robust recognition performance compared to other state-of-the-art DNN-based detectors.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125886062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hybrid Deep Learning Model For Diagnosis Of Covid-19 Using Ct Scans And Clinical/Demographic Data 使用Ct扫描和临床/人口统计数据诊断Covid-19的混合深度学习模型
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506661
Parnian Afshar, Shahin Heidarian, F. Naderkhani, M. Rafiee, A. Oikonomou, K. Plataniotis, Arash Mohammadi
The unprecedented COVID-19 pandemic has been remarkably impacting the world and influencing a broad aspect of people’s lives since its first emergence in late 2019. The highly contagious nature of the COVID-19 has raised the necessity of developing deep learning-based diagnostic tools to identify the infected cases in the early stages. Recently, we proposed a fully-automated framework based on Capsule Networks, referred to as the CT-CAPS, to distinguish COVID-19 infection from normal and Community Acquired Pneumonia (CAP) cases using chest Computed Tomography (CT) scans. Although CT scans can provide a comprehensive illustration of the lung abnormalities, COVID-19 lung manifestations highly overlap with the CAP findings making their identification challenging even for experienced radiologists. Here, the CT-CAPS is augmented with a wide range of clinical/demographic data, including patients’ gender, age, weight and symptoms. More specifically, we propose a hybrid deep learning model that utilizes both clinical/demographic data and CT scans to classify COVID-19 and non-COVID cases using a Random Forest Classifier. The proposed hybrid model specifies the most important predictive factors increasing the explainability of the model. The experimental results show that the proposed hybrid model improves the CT-CAPS performance, achieving accuracy of 90.8%, sensitivity of 94.5% and specificity of 86.0%.
自2019年底首次出现以来,前所未有的COVID-19大流行给世界带来了巨大影响,并影响了人们生活的广泛方面。新型冠状病毒感染症(COVID-19)具有高度传染性,因此需要开发基于深度学习的诊断工具,以便在早期阶段识别感染病例。最近,我们提出了一个基于胶囊网络的全自动框架,称为CT- caps,通过胸部计算机断层扫描(CT)来区分COVID-19感染与正常和社区获得性肺炎(CAP)病例。尽管CT扫描可以提供肺部异常的全面说明,但COVID-19肺部表现与CAP发现高度重叠,即使对经验丰富的放射科医生来说,识别它们也很困难。CT-CAPS增加了广泛的临床/人口统计数据,包括患者的性别、年龄、体重和症状。更具体地说,我们提出了一种混合深度学习模型,该模型利用临床/人口统计数据和CT扫描,使用随机森林分类器对COVID-19和非COVID-19病例进行分类。提出的混合模型明确了最重要的预测因素,增加了模型的可解释性。实验结果表明,所提出的混合模型提高了CT-CAPS的性能,准确率为90.8%,灵敏度为94.5%,特异性为86.0%。
{"title":"Hybrid Deep Learning Model For Diagnosis Of Covid-19 Using Ct Scans And Clinical/Demographic Data","authors":"Parnian Afshar, Shahin Heidarian, F. Naderkhani, M. Rafiee, A. Oikonomou, K. Plataniotis, Arash Mohammadi","doi":"10.1109/ICIP42928.2021.9506661","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506661","url":null,"abstract":"The unprecedented COVID-19 pandemic has been remarkably impacting the world and influencing a broad aspect of people’s lives since its first emergence in late 2019. The highly contagious nature of the COVID-19 has raised the necessity of developing deep learning-based diagnostic tools to identify the infected cases in the early stages. Recently, we proposed a fully-automated framework based on Capsule Networks, referred to as the CT-CAPS, to distinguish COVID-19 infection from normal and Community Acquired Pneumonia (CAP) cases using chest Computed Tomography (CT) scans. Although CT scans can provide a comprehensive illustration of the lung abnormalities, COVID-19 lung manifestations highly overlap with the CAP findings making their identification challenging even for experienced radiologists. Here, the CT-CAPS is augmented with a wide range of clinical/demographic data, including patients’ gender, age, weight and symptoms. More specifically, we propose a hybrid deep learning model that utilizes both clinical/demographic data and CT scans to classify COVID-19 and non-COVID cases using a Random Forest Classifier. The proposed hybrid model specifies the most important predictive factors increasing the explainability of the model. The experimental results show that the proposed hybrid model improves the CT-CAPS performance, achieving accuracy of 90.8%, sensitivity of 94.5% and specificity of 86.0%.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125968349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multi-Task Occlusion Learning for Real-Time Visual Object Tracking 实时视觉目标跟踪的多任务遮挡学习
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506239
Gozde Sahin, L. Itti
Occlusion handling is one of the important challenges in the field of visual tracking, especially for real-time applications, where further processing for occlusion reasoning may not always be possible. In this paper, an occlusion-aware real-time object tracker is proposed, which enhances the baseline SiamRPN model with an additional branch that directly predicts the occlusion level of the object. Experimental results on GOT-10k and VOT benchmarks show that learning to predict occlusion levels end-to-end in this multi-task learning framework helps improve tracking accuracy, especially on frames that contain occlusions. Up to 7% improvement on EAO scores can be observed for occluded frames, which are only 11% of the data. The performance results over all frames also indicate the model does favorably compared to the other trackers.
遮挡处理是视觉跟踪领域的重要挑战之一,特别是在实时应用中,对遮挡推理的进一步处理可能并不总是可能的。本文提出了一种能够感知遮挡的实时目标跟踪器,该方法在SiamRPN基线模型的基础上增加了一个分支,直接预测目标的遮挡程度。在GOT-10k和VOT基准测试上的实验结果表明,在这个多任务学习框架中学习端到端预测遮挡水平有助于提高跟踪精度,特别是在包含遮挡的帧上。遮挡帧仅占数据的11%,EAO评分可提高7%。所有帧的性能结果也表明,与其他跟踪器相比,该模型表现良好。
{"title":"Multi-Task Occlusion Learning for Real-Time Visual Object Tracking","authors":"Gozde Sahin, L. Itti","doi":"10.1109/ICIP42928.2021.9506239","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506239","url":null,"abstract":"Occlusion handling is one of the important challenges in the field of visual tracking, especially for real-time applications, where further processing for occlusion reasoning may not always be possible. In this paper, an occlusion-aware real-time object tracker is proposed, which enhances the baseline SiamRPN model with an additional branch that directly predicts the occlusion level of the object. Experimental results on GOT-10k and VOT benchmarks show that learning to predict occlusion levels end-to-end in this multi-task learning framework helps improve tracking accuracy, especially on frames that contain occlusions. Up to 7% improvement on EAO scores can be observed for occluded frames, which are only 11% of the data. The performance results over all frames also indicate the model does favorably compared to the other trackers.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124765150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Novel Method For Segmentation Of Breast Masses Based On Mammography Images 一种基于乳房x线摄影图像的乳腺肿块分割新方法
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506159
Haichao Cao, Shiliang Pu, Wenming Tan
The accurate segmentation of breast masses in mammography images is a key step in the diagnosis of early breast cancer. To solve the problem of various shapes and sizes of breast masses, this paper proposes a cascaded UNet architecture, which is referred to as CasUNet. CasUNet contains six UNet subnetworks, the network depth increases from 1 to 6, and the output features between adjacent subnetworks are cascaded. Furthermore, we have integrated the channel attention mechanism based on CasUNet, hoping that it can focus on the important feature maps. Aiming at the problem that the edges of irregular breast masses are difficult to segment, a multi-stage cascaded training method is presented, which can gradually expand the context information of breast masses to assist the training of the segmentation model. To alleviate the problem of fewer training samples, a data augmentation method for background migration is proposed. This method transfers the background of the unlabeled samples to the labeled samples through the histogram specification technique, thereby improving the diversity of the training data. The above method has been experimentally verified on two datasets, INbreast and DDSM. Experimental results show that the proposed method can obtain competitive segmentation performance.
乳房x线影像中乳腺肿块的准确分割是早期乳腺癌诊断的关键步骤。为了解决乳腺肿块不同形状和大小的问题,本文提出了一种级联UNet架构,简称CasUNet。CasUNet包含6个UNet子网,网络深度从1增加到6,相邻子网之间的输出特征级联。此外,我们还集成了基于CasUNet的通道注意机制,希望它能专注于重要的特征图。针对不规则乳腺肿块边缘难以分割的问题,提出了一种多阶段级联训练方法,该方法可逐步扩展乳腺肿块的上下文信息,辅助分割模型的训练。为了解决训练样本较少的问题,提出了一种背景迁移的数据增强方法。该方法通过直方图规范技术将未标记样本的背景转移到标记样本中,从而提高了训练数据的多样性。上述方法在INbreast和DDSM两个数据集上进行了实验验证。实验结果表明,该方法具有较好的分割性能。
{"title":"A Novel Method For Segmentation Of Breast Masses Based On Mammography Images","authors":"Haichao Cao, Shiliang Pu, Wenming Tan","doi":"10.1109/ICIP42928.2021.9506159","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506159","url":null,"abstract":"The accurate segmentation of breast masses in mammography images is a key step in the diagnosis of early breast cancer. To solve the problem of various shapes and sizes of breast masses, this paper proposes a cascaded UNet architecture, which is referred to as CasUNet. CasUNet contains six UNet subnetworks, the network depth increases from 1 to 6, and the output features between adjacent subnetworks are cascaded. Furthermore, we have integrated the channel attention mechanism based on CasUNet, hoping that it can focus on the important feature maps. Aiming at the problem that the edges of irregular breast masses are difficult to segment, a multi-stage cascaded training method is presented, which can gradually expand the context information of breast masses to assist the training of the segmentation model. To alleviate the problem of fewer training samples, a data augmentation method for background migration is proposed. This method transfers the background of the unlabeled samples to the labeled samples through the histogram specification technique, thereby improving the diversity of the training data. The above method has been experimentally verified on two datasets, INbreast and DDSM. Experimental results show that the proposed method can obtain competitive segmentation performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124815449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2021 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1