首页 > 最新文献

2020 25th International Conference on Pattern Recognition (ICPR)最新文献

英文 中文
Contrastive Data Learning for Facial Pose and Illumination Normalization 面部姿态与光照归一化的对比数据学习
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412811
G. Hsu, Chia-Hao Tang, S. Yanushkevich, M. Gavrilova
Face normalization can be a crucial step when handling generic face recognition. We propose the Pose and Illumination Normalization (PIN) framework with contrast data learning for face normalization. The PIN framework is designed to learn the transformation from a source set to a target set. The source set and the target set compose a contrastive data set for learning. The source set contains faces collected in the wild and thus covers a wide range of variation across illumination, pose, expression and other variables. The target set contains face images taken under controlled conditions and all faces are in frontal pose and balanced in illumination. The PIN framework is composed of an encoder, a decoder and two discriminators. The encoder is made of a state-of-the-art face recognition network and acts as a facial feature extractor, which is not updated during training. The decoder is trained on both the source and target sets, and aims to learn the transformation from the source set to the target set; and therefore, it can transform an arbitrary face into a illumination and pose normalized face. The discriminators are trained to ensure the photo-realistic quality of the normalized face images generated by the decoder. The loss functions employed in the decoder and discriminators are appropriately designed and weighted for yielding better normalization outcomes and recognition performance. We verify the performance of the propose framework on several benchmark databases, and compare with state-of-the-art approaches.
在处理通用人脸识别时,人脸归一化是至关重要的一步。我们提出了基于对比数据学习的姿态和光照归一化(PIN)框架用于人脸归一化。PIN框架被设计用来学习从源集到目标集的转换。源集和目标集组成一个用于学习的对比数据集。源集包含在野外收集的人脸,因此涵盖了光照、姿势、表情和其他变量的广泛变化。目标集包含在受控条件下拍摄的人脸图像,所有人脸都处于正面姿势,并且光照平衡。PIN框架由一个编码器、一个解码器和两个鉴别器组成。编码器由最先进的面部识别网络组成,作为面部特征提取器,在训练期间不更新。解码器在源集和目标集上进行训练,目的是学习从源集到目标集的转换;因此,它可以将任意人脸转换为光照和姿态标准化的人脸。训练鉴别器以确保解码器生成的归一化人脸图像的逼真质量。为了获得更好的归一化结果和识别性能,我们对解码器和鉴别器中使用的损失函数进行了适当的设计和加权。我们在几个基准数据库上验证了所提出框架的性能,并与最先进的方法进行了比较。
{"title":"Contrastive Data Learning for Facial Pose and Illumination Normalization","authors":"G. Hsu, Chia-Hao Tang, S. Yanushkevich, M. Gavrilova","doi":"10.1109/ICPR48806.2021.9412811","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412811","url":null,"abstract":"Face normalization can be a crucial step when handling generic face recognition. We propose the Pose and Illumination Normalization (PIN) framework with contrast data learning for face normalization. The PIN framework is designed to learn the transformation from a source set to a target set. The source set and the target set compose a contrastive data set for learning. The source set contains faces collected in the wild and thus covers a wide range of variation across illumination, pose, expression and other variables. The target set contains face images taken under controlled conditions and all faces are in frontal pose and balanced in illumination. The PIN framework is composed of an encoder, a decoder and two discriminators. The encoder is made of a state-of-the-art face recognition network and acts as a facial feature extractor, which is not updated during training. The decoder is trained on both the source and target sets, and aims to learn the transformation from the source set to the target set; and therefore, it can transform an arbitrary face into a illumination and pose normalized face. The discriminators are trained to ensure the photo-realistic quality of the normalized face images generated by the decoder. The loss functions employed in the decoder and discriminators are appropriately designed and weighted for yielding better normalization outcomes and recognition performance. We verify the performance of the propose framework on several benchmark databases, and compare with state-of-the-art approaches.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"44 1","pages":"8336-8343"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87776324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ACRM: Attention Cascade R-CNN with Mix-NMS for Metallic Surface Defect Detection 基于混合神经网络的关注级联R-CNN金属表面缺陷检测
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412424
Junting Fang, Xiaoyang Tan, Yuhui Wang
Metallic surface defect detection is of great significance in quality control for production. However, this task is very challenging due to the noise disturbance, large appearance variation, and the ambiguous definition of the defect individual. Traditional image processing methods are unable to detect the damaged region effectively and efficiently. In this paper, we propose a new defect detection method, Attention Cascade R-CNN with Mix-NMS (ACRM), to classify and locate defects robustly. Three submodules are developed to achieve this goal: 1) a lightweight attention block is introduced, which can improve the ability in capture global and local feature both in the spatial and channel dimension; 2) we firstly apply the cascade R-CNN to our task, which exploits multiple detectors to sequentially refine the detection result robustly; 3) we introduce a new method named Mix Non-Maximum Suppression (Mix-NMS), which can significantly improve its ability in filtering the redundant detection result in our task. Extensive experiments on a real industrial dataset show that ACRM achieves state-of-the-art results compared to the existing methods, demonstrating the effectiveness and robustness of our detection method.
金属表面缺陷检测在生产质量控制中具有重要意义。然而,由于噪声干扰、较大的外观变化以及缺陷个体的模糊定义,这一任务非常具有挑战性。传统的图像处理方法无法有效、高效地检测出损伤区域。本文提出了一种新的缺陷检测方法——混合nms (Attention Cascade R-CNN with Mix-NMS, ACRM)来对缺陷进行鲁棒分类和定位。为了实现这一目标,本文开发了三个子模块:1)引入了一个轻量级的注意力块,提高了在空间和信道维度上捕获全局和局部特征的能力;2)我们首先将级联R-CNN应用于我们的任务,该任务利用多个检测器对检测结果进行顺序鲁棒化改进;3)我们引入了一种新的混合非最大抑制(Mix- nms)方法,它可以显著提高我们任务中冗余检测结果的过滤能力。在真实工业数据集上的大量实验表明,与现有方法相比,ACRM达到了最先进的结果,证明了我们的检测方法的有效性和鲁棒性。
{"title":"ACRM: Attention Cascade R-CNN with Mix-NMS for Metallic Surface Defect Detection","authors":"Junting Fang, Xiaoyang Tan, Yuhui Wang","doi":"10.1109/ICPR48806.2021.9412424","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412424","url":null,"abstract":"Metallic surface defect detection is of great significance in quality control for production. However, this task is very challenging due to the noise disturbance, large appearance variation, and the ambiguous definition of the defect individual. Traditional image processing methods are unable to detect the damaged region effectively and efficiently. In this paper, we propose a new defect detection method, Attention Cascade R-CNN with Mix-NMS (ACRM), to classify and locate defects robustly. Three submodules are developed to achieve this goal: 1) a lightweight attention block is introduced, which can improve the ability in capture global and local feature both in the spatial and channel dimension; 2) we firstly apply the cascade R-CNN to our task, which exploits multiple detectors to sequentially refine the detection result robustly; 3) we introduce a new method named Mix Non-Maximum Suppression (Mix-NMS), which can significantly improve its ability in filtering the redundant detection result in our task. Extensive experiments on a real industrial dataset show that ACRM achieves state-of-the-art results compared to the existing methods, demonstrating the effectiveness and robustness of our detection method.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"229 1","pages":"423-430"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86885417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Deep Semantic Segmentation of RGB-D Data with Entangled Forests 基于纠缠森林的RGB-D数据深度语义分割
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412787
Matteo Terreran, Elia Bonetto, S. Ghidoni
Semantic segmentation is a problem which is getting more and more attention in the computer vision community. Nowadays, deep learning methods represent the state of the art to solve this problem, and the trend is to use deeper networks to get higher performance. The drawback with such models is a higher computational cost, which makes it difficult to integrate them on mobile robot platforms. In this work we want to explore how to obtain lighter deep learning models without compromising performance. To do so we will consider the features used in the 3D Entangled Forests algorithm and we will study the best strategies to integrate these within FuseNet deep network. Such new features allow us to shrink the network size without loosing performance, obtaining hence a lighter model which achieves state-of-the-art performance on the semantic segmentation task and represents an interesting alternative for mobile robotics applications, where computational power and energy are limited.
语义分割是计算机视觉界越来越关注的一个问题。目前,深度学习方法代表了解决这一问题的最新技术,并且趋势是使用更深的网络来获得更高的性能。这种模型的缺点是计算成本较高,这使得很难将它们集成到移动机器人平台上。在这项工作中,我们想探索如何在不影响性能的情况下获得更轻的深度学习模型。为此,我们将考虑3D纠缠森林算法中使用的特征,并将研究将这些特征集成到FuseNet深度网络中的最佳策略。这些新功能使我们能够在不损失性能的情况下缩小网络大小,从而获得更轻的模型,在语义分割任务上实现最先进的性能,并代表了计算能力和能量有限的移动机器人应用的有趣替代方案。
{"title":"Enhancing Deep Semantic Segmentation of RGB-D Data with Entangled Forests","authors":"Matteo Terreran, Elia Bonetto, S. Ghidoni","doi":"10.1109/ICPR48806.2021.9412787","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412787","url":null,"abstract":"Semantic segmentation is a problem which is getting more and more attention in the computer vision community. Nowadays, deep learning methods represent the state of the art to solve this problem, and the trend is to use deeper networks to get higher performance. The drawback with such models is a higher computational cost, which makes it difficult to integrate them on mobile robot platforms. In this work we want to explore how to obtain lighter deep learning models without compromising performance. To do so we will consider the features used in the 3D Entangled Forests algorithm and we will study the best strategies to integrate these within FuseNet deep network. Such new features allow us to shrink the network size without loosing performance, obtaining hence a lighter model which achieves state-of-the-art performance on the semantic segmentation task and represents an interesting alternative for mobile robotics applications, where computational power and energy are limited.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"1 1","pages":"4634-4641"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88260664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology 基于图神经网络的组织病理学兴趣区检索自监督学习
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412903
Yigit Ozen, S. Aksoy, K. Kösemehmetoğlu, S. Önder, A. Üner
Deep learning has achieved successful performance in representation learning and content-based retrieval of histopathology images. The commonly used setting in deep learning-based approaches is supervised training of deep neural networks for classification, and using the trained model to extract representations that are used for computing and ranking the distances between images. However, there are two remaining major challenges. First, supervised training of deep neural networks requires large amount of manually labeled data which is often limited in the medical field. Transfer learning has been used to overcome this challenge, but its success remained limited. Second, the clinical practice in histopathology necessitates working with regions of interest (ROI) of multiple diagnostic classes with arbitrary shapes and sizes. The typical solution to this problem is to aggregate the representations of fixed-sized patches cropped from these regions to obtain region-level representations. However, naive methods cannot sufficiently exploit the rich contextual information in the complex tissue structures. To tackle these two challenges, we propose a generic method that utilizes graph neural networks (GNN), combined with a self-supervised training method using a contrastive loss. GNN enables representing arbitrarily-shaped ROIs as graphs and encoding contextual information. Self-supervised contrastive learning improves quality of learned representations without requiring labeled data. The experiments using a challenging breast histopathology data set show that the proposed method achieves better performance than the state-of-the-art.
深度学习在组织病理学图像的表示学习和基于内容的检索方面取得了成功的表现。在基于深度学习的方法中,常用的设置是对深度神经网络进行监督训练,用于分类,并使用训练好的模型提取用于计算和排序图像之间距离的表示。然而,仍然存在两个主要挑战。首先,深度神经网络的监督训练需要大量的人工标记数据,而这在医学领域往往是有限的。迁移学习已经被用来克服这一挑战,但它的成功仍然有限。其次,组织病理学的临床实践需要处理任意形状和大小的多个诊断类别的感兴趣区域(ROI)。该问题的典型解决方案是从这些区域裁剪的固定大小补丁的表示进行聚合,以获得区域级表示。然而,简单的方法不能充分利用复杂组织结构中丰富的上下文信息。为了解决这两个挑战,我们提出了一种利用图神经网络(GNN)的通用方法,结合使用对比损失的自监督训练方法。GNN支持将任意形状的roi表示为图形并编码上下文信息。自监督对比学习在不需要标记数据的情况下提高了学习表征的质量。使用具有挑战性的乳腺组织病理学数据集的实验表明,所提出的方法比最先进的方法取得了更好的性能。
{"title":"Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology","authors":"Yigit Ozen, S. Aksoy, K. Kösemehmetoğlu, S. Önder, A. Üner","doi":"10.1109/ICPR48806.2021.9412903","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412903","url":null,"abstract":"Deep learning has achieved successful performance in representation learning and content-based retrieval of histopathology images. The commonly used setting in deep learning-based approaches is supervised training of deep neural networks for classification, and using the trained model to extract representations that are used for computing and ranking the distances between images. However, there are two remaining major challenges. First, supervised training of deep neural networks requires large amount of manually labeled data which is often limited in the medical field. Transfer learning has been used to overcome this challenge, but its success remained limited. Second, the clinical practice in histopathology necessitates working with regions of interest (ROI) of multiple diagnostic classes with arbitrary shapes and sizes. The typical solution to this problem is to aggregate the representations of fixed-sized patches cropped from these regions to obtain region-level representations. However, naive methods cannot sufficiently exploit the rich contextual information in the complex tissue structures. To tackle these two challenges, we propose a generic method that utilizes graph neural networks (GNN), combined with a self-supervised training method using a contrastive loss. GNN enables representing arbitrarily-shaped ROIs as graphs and encoding contextual information. Self-supervised contrastive learning improves quality of learned representations without requiring labeled data. The experiments using a challenging breast histopathology data set show that the proposed method achieves better performance than the state-of-the-art.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"25 1","pages":"6329-6334"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82660836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
CT-UNet: An Improved Neural Network Based on U-Net for Building Segmentation in Remote Sensing Images CT-UNet:一种基于U-Net的改进神经网络用于遥感图像中建筑物分割
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412355
Huanran Ye, Sheng Liu, K. Jin, Haohao Cheng
With the proliferation of remote sensing images, how to segment buildings more accurately in remote sensing images is a critical challenge. First, the high resolution leads to blurred boundaries in the extracted building maps. Second, the similarity between buildings and background results in intra-class inconsistency. To address these two problems, we propose an UNet-based network named Context-Transfer-UNet (CT-UNet). Specifically, we design Dense Boundary Block (DBB). Dense Block utilizes reuse mechanism to refine features and increase recognition capabilities. Boundary Block introduces the low-level spatial information to solve the fuzzy boundary problem. Then, to handle intra-class inconsistency, we construct Spatial Channel Attention Block (SCAB). It combines context space information and selects more distinguishable features from space and channel. Finally, we propose a novel loss function to enhance the purpose of loss by adding evaluation indicator. Based on our proposed CT-UNet, we achieve 85.33% mean IoU on the Inria dataset and 91.00% mean IoU on the WHU dataset, which outperforms our baseline (U-Net ResNet-34) by 3.76% and Web-Net by 2.24%.
随着遥感图像的激增,如何在遥感图像中更准确地分割建筑物是一个关键的挑战。首先,高分辨率导致提取的建筑地图边界模糊。其次,建筑和背景的相似性导致了类内不一致。为了解决这两个问题,我们提出了一个基于unet的网络,称为上下文传输unet (CT-UNet)。具体来说,我们设计了密集边界块(DBB)。稠密块利用重用机制来细化特征,提高识别能力。边界块引入底层空间信息,解决模糊边界问题。然后,为了处理类内不一致性,我们构建了空间通道注意块(SCAB)。它结合上下文空间信息,从空间和通道中选择更多可区分的特征。最后,我们提出了一种新的损失函数,通过增加评价指标来增强损失的目的。基于我们提出的CT-UNet,我们在Inria数据集上实现了85.33%的平均IoU,在WHU数据集上实现了91.00%的平均IoU,比我们的基线(U-Net ResNet-34)高3.76%,比Web-Net高2.24%。
{"title":"CT-UNet: An Improved Neural Network Based on U-Net for Building Segmentation in Remote Sensing Images","authors":"Huanran Ye, Sheng Liu, K. Jin, Haohao Cheng","doi":"10.1109/ICPR48806.2021.9412355","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412355","url":null,"abstract":"With the proliferation of remote sensing images, how to segment buildings more accurately in remote sensing images is a critical challenge. First, the high resolution leads to blurred boundaries in the extracted building maps. Second, the similarity between buildings and background results in intra-class inconsistency. To address these two problems, we propose an UNet-based network named Context-Transfer-UNet (CT-UNet). Specifically, we design Dense Boundary Block (DBB). Dense Block utilizes reuse mechanism to refine features and increase recognition capabilities. Boundary Block introduces the low-level spatial information to solve the fuzzy boundary problem. Then, to handle intra-class inconsistency, we construct Spatial Channel Attention Block (SCAB). It combines context space information and selects more distinguishable features from space and channel. Finally, we propose a novel loss function to enhance the purpose of loss by adding evaluation indicator. Based on our proposed CT-UNet, we achieve 85.33% mean IoU on the Inria dataset and 91.00% mean IoU on the WHU dataset, which outperforms our baseline (U-Net ResNet-34) by 3.76% and Web-Net by 2.24%.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"51 1","pages":"166-172"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82668452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Learning Interpretable Representation for 3D Point Clouds 学习三维点云的可解释表示
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412440
Feng-Guang Su, Ci-Siang Lin, Y. Wang
Point clouds have emerged as a popular representation of 3D visual data. With a set of unordered 3D points, one typically needs to transform them into latent representation before further classification and segmentation tasks. However, one cannot easily interpret such encoded latent representation. To address this issue, we propose a unique deep learning framework for disentangling body-type and pose information from 3D point clouds. Extending from autoencoder, we advance adversarial learning a selected feature type, while classification and data recovery can be additionally observed. Our experiments confirm that our model can be successfully applied to perform a wide range of 3D applications like shape synthesis, action translation, shape/action interpolation, and synchronization.
点云作为一种流行的3D视觉数据表示形式出现了。对于一组无序的三维点,在进一步的分类和分割任务之前,通常需要将它们转换为潜在表示。然而,人们不容易解释这种编码的潜在表征。为了解决这个问题,我们提出了一个独特的深度学习框架,用于从3D点云中分离出身体类型和姿态信息。从自编码器扩展,我们推进了对抗性学习选择的特征类型,而分类和数据恢复可以额外观察。我们的实验证实,我们的模型可以成功地应用于执行广泛的3D应用,如形状合成,动作转换,形状/动作插值和同步。
{"title":"Learning Interpretable Representation for 3D Point Clouds","authors":"Feng-Guang Su, Ci-Siang Lin, Y. Wang","doi":"10.1109/ICPR48806.2021.9412440","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412440","url":null,"abstract":"Point clouds have emerged as a popular representation of 3D visual data. With a set of unordered 3D points, one typically needs to transform them into latent representation before further classification and segmentation tasks. However, one cannot easily interpret such encoded latent representation. To address this issue, we propose a unique deep learning framework for disentangling body-type and pose information from 3D point clouds. Extending from autoencoder, we advance adversarial learning a selected feature type, while classification and data recovery can be additionally observed. Our experiments confirm that our model can be successfully applied to perform a wide range of 3D applications like shape synthesis, action translation, shape/action interpolation, and synchronization.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"46 1","pages":"7470-7477"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82943531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Correlation-based ConvNet for Small Object Detection in Videos 基于相关性的卷积神经网络视频小目标检测
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413127
Brais Bosquet, M. Mucientes, V. Brea
The detection of small objects is of particular interest in many real applications. In this paper, we propose STDnet-ST, a novel approach to small object detection in video using spatial information operating alongside temporal video information. STDnet-ST is an end-to-end spatio-temporal convolutional neural network that detects small objects over time and correlates pairs of the top-ranked regions with the highest likelihood of containing small objects. This architecture links the small objects across the time as tubelets, being able to dismiss unprofitable object links in order to provide high-quality tubelets. STDnet-ST achieves state-of-the-art results for small objects on the publicly available USC-GRAD-STDdb and UAVDT video datasets.
在许多实际应用中,对小物体的检测是特别有趣的。在本文中,我们提出了STDnet-ST,这是一种利用空间信息和时间视频信息一起操作的视频小目标检测的新方法。STDnet-ST是一个端到端的时空卷积神经网络,可以随着时间的推移检测小物体,并将包含小物体的可能性最高的排名最高的区域对关联起来。这种体系结构将小对象作为tubelet连接起来,能够排除无利可图的对象链接,以提供高质量的tubelet。STDnet-ST在公开可用的USC-GRAD-STDdb和UAVDT视频数据集上实现了最先进的小对象结果。
{"title":"Correlation-based ConvNet for Small Object Detection in Videos","authors":"Brais Bosquet, M. Mucientes, V. Brea","doi":"10.1109/ICPR48806.2021.9413127","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413127","url":null,"abstract":"The detection of small objects is of particular interest in many real applications. In this paper, we propose STDnet-ST, a novel approach to small object detection in video using spatial information operating alongside temporal video information. STDnet-ST is an end-to-end spatio-temporal convolutional neural network that detects small objects over time and correlates pairs of the top-ranked regions with the highest likelihood of containing small objects. This architecture links the small objects across the time as tubelets, being able to dismiss unprofitable object links in order to provide high-quality tubelets. STDnet-ST achieves state-of-the-art results for small objects on the publicly available USC-GRAD-STDdb and UAVDT video datasets.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"26 1","pages":"1979-1984"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91485051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GPSRL: Learning Semi-Parametric Bayesian Survival Rule Lists from Heterogeneous Patient Data GPSRL:从异构患者数据中学习半参数贝叶斯生存规则列表
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413157
Ameer Hamza Shakur, Xiaoning Qian, Zhangyang Wang, B. Mortazavi, Shuai Huang
Survival data is often collected in medical applications from a heterogeneous population of patients. While in the past, popular survival models focused on modeling the average effect of the covariates on survival outcomes, rapidly advancing sensing and information technologies have provided opportunities to further model the heterogeneity of the population as well as the non-linearity of the survival risk. With this motivation, we propose a new semi-parametric Bayesian Survival Rule List model in this paper. Our model derives a rule-based decision-making approach, while within the regime defined by each rule, survival risk is modelled via a Gaussian process latent variable model. Markov Chain Monte Carlo with a nested Laplace approximation on the Gaussian process posterior is used to search over the posterior of the rule lists efficiently. The use of ordered rule lists enables us to model heterogeneity while keeping the model complexity in check. Performance evaluations on a synthetic heterogeneous survival dataset and a real world sepsis survival dataset demonstrate the effectiveness of our model.
在医学应用中,生存数据通常是从异质患者群体中收集的。在过去,流行的生存模型侧重于对协变量对生存结果的平均影响进行建模,而快速发展的传感和信息技术为进一步模拟种群的异质性以及生存风险的非线性提供了机会。基于这一动机,本文提出了一种新的半参数贝叶斯生存规则表模型。我们的模型衍生出一种基于规则的决策方法,而在每条规则定义的制度中,生存风险通过高斯过程潜在变量模型建模。利用马尔科夫链蒙特卡罗算法在高斯过程后验上嵌套拉普拉斯逼近,有效地搜索规则表的后验。有序规则列表的使用使我们能够在控制模型复杂性的同时建模异构性。在合成异质生存数据集和真实脓毒症生存数据集上的性能评估证明了我们模型的有效性。
{"title":"GPSRL: Learning Semi-Parametric Bayesian Survival Rule Lists from Heterogeneous Patient Data","authors":"Ameer Hamza Shakur, Xiaoning Qian, Zhangyang Wang, B. Mortazavi, Shuai Huang","doi":"10.1109/ICPR48806.2021.9413157","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413157","url":null,"abstract":"Survival data is often collected in medical applications from a heterogeneous population of patients. While in the past, popular survival models focused on modeling the average effect of the covariates on survival outcomes, rapidly advancing sensing and information technologies have provided opportunities to further model the heterogeneity of the population as well as the non-linearity of the survival risk. With this motivation, we propose a new semi-parametric Bayesian Survival Rule List model in this paper. Our model derives a rule-based decision-making approach, while within the regime defined by each rule, survival risk is modelled via a Gaussian process latent variable model. Markov Chain Monte Carlo with a nested Laplace approximation on the Gaussian process posterior is used to search over the posterior of the rule lists efficiently. The use of ordered rule lists enables us to model heterogeneity while keeping the model complexity in check. Performance evaluations on a synthetic heterogeneous survival dataset and a real world sepsis survival dataset demonstrate the effectiveness of our model.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"14 1","pages":"10608-10615"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91537118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compact CNN Structure Learning by Knowledge Distillation 基于知识蒸馏的紧凑CNN结构学习
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413006
Waqar Ahmed, Andrea Zunino, Pietro Morerio, V. Murino
The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure while preserving better control over the compression-performance tradeoff. Considering specific resource constraints, e.g., floating-point operations per inference (FLOPs) or model-parameters, our method results in a state of the art network compression while being capable of achieving better inference accuracy. In a comprehensive evaluation, we demonstrate that our method is effective, robust, and consistent with results over a variety of network architectures and datasets, at negligible training overhead. In particular, for the already compact network MobileNet_v2, our method offers up to 2× and 5.2× better model compression in terms of FLOPs and model-parameters, respectively, while getting 1.05% better model performance than the baseline network.
压缩深度卷积神经网络(cnn)的概念对于在嵌入式设备上使用有限的计算、功率和内存资源至关重要。然而,现有的方法以降低计算机视觉任务的推理精度为代价来实现这一目标。为了解决这样的缺点,我们提出了一个框架,利用知识蒸馏和可定制的块智能优化来学习轻量级CNN结构,同时更好地控制压缩性能权衡。考虑到特定的资源约束,例如,每个推理(FLOPs)或模型参数的浮点操作,我们的方法在能够实现更好的推理精度的同时,产生了最先进的网络压缩状态。在全面的评估中,我们证明了我们的方法是有效的,鲁棒的,并且与各种网络架构和数据集的结果一致,而训练开销可以忽略不计。特别是,对于已经很紧凑的网络MobileNet_v2,我们的方法在FLOPs和模型参数方面分别提供了高达2倍和5.2倍的模型压缩,同时获得了比基线网络更好的1.05%的模型性能。
{"title":"Compact CNN Structure Learning by Knowledge Distillation","authors":"Waqar Ahmed, Andrea Zunino, Pietro Morerio, V. Murino","doi":"10.1109/ICPR48806.2021.9413006","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413006","url":null,"abstract":"The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in inference accuracy in computer vision tasks. To address such a drawback, we propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure while preserving better control over the compression-performance tradeoff. Considering specific resource constraints, e.g., floating-point operations per inference (FLOPs) or model-parameters, our method results in a state of the art network compression while being capable of achieving better inference accuracy. In a comprehensive evaluation, we demonstrate that our method is effective, robust, and consistent with results over a variety of network architectures and datasets, at negligible training overhead. In particular, for the already compact network MobileNet_v2, our method offers up to 2× and 5.2× better model compression in terms of FLOPs and model-parameters, respectively, while getting 1.05% better model performance than the baseline network.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"42 1","pages":"6554-6561"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90214867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Separation of Aleatoric and Epistemic Uncertainty in Deterministic Deep Neural Networks 确定性深度神经网络中任意不确定性与认知不确定性的分离
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412616
Denis Huseljic, B. Sick, M. Herde, D. Kottke
Despite the success of deep neural networks (DNN) in many applications, their ability to model uncertainty is still significantly limited. For example, in safety-critical applications such as autonomous driving, it is crucial to obtain a prediction that reflects different types of uncertainty to address life-threatening situations appropriately. In such cases, it is essential to be aware of the risk (i.e., aleatoric uncertainty) and the reliability (i.e., epistemic uncertainty) that comes with a prediction. We present AE-DNN, a model allowing the separation of aleatoric and epistemic uncertainty while maintaining a proper generalization capability. AE-DNN is based on deterministic DNN, which can determine the respective uncertainty measures in a single forward pass. In analyses with synthetic and image data, we show that our method improves the modeling of epistemic uncertainty while providing an intuitively understandable separation of risk and reliability.
尽管深度神经网络(DNN)在许多应用中取得了成功,但它们对不确定性建模的能力仍然非常有限。例如,在自动驾驶等安全关键应用中,获得反映不同类型不确定性的预测以适当解决危及生命的情况至关重要。在这种情况下,必须意识到预测带来的风险(即任意不确定性)和可靠性(即认知不确定性)。我们提出了AE-DNN模型,该模型允许在保持适当泛化能力的同时分离任意不确定性和认知不确定性。AE-DNN基于确定性DNN,可以确定单个前向传递中各自的不确定性测度。在对合成数据和图像数据的分析中,我们表明我们的方法改进了认知不确定性的建模,同时提供了直观可理解的风险和可靠性分离。
{"title":"Separation of Aleatoric and Epistemic Uncertainty in Deterministic Deep Neural Networks","authors":"Denis Huseljic, B. Sick, M. Herde, D. Kottke","doi":"10.1109/ICPR48806.2021.9412616","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412616","url":null,"abstract":"Despite the success of deep neural networks (DNN) in many applications, their ability to model uncertainty is still significantly limited. For example, in safety-critical applications such as autonomous driving, it is crucial to obtain a prediction that reflects different types of uncertainty to address life-threatening situations appropriately. In such cases, it is essential to be aware of the risk (i.e., aleatoric uncertainty) and the reliability (i.e., epistemic uncertainty) that comes with a prediction. We present AE-DNN, a model allowing the separation of aleatoric and epistemic uncertainty while maintaining a proper generalization capability. AE-DNN is based on deterministic DNN, which can determine the respective uncertainty measures in a single forward pass. In analyses with synthetic and image data, we show that our method improves the modeling of epistemic uncertainty while providing an intuitively understandable separation of risk and reliability.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"42 1","pages":"9172-9179"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90481058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2020 25th International Conference on Pattern Recognition (ICPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1