2020 25th International Conference on Pattern Recognition (ICPR)最新文献_第5页

Attack-agnostic Adversarial Detection on Medical Data Using Explainable Machine Learning 使用可解释的机器学习对医疗数据进行攻击不可知论对抗检测

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412560

Matthew Watson, N. A. Moubayed

Explainable machine learning has become increasingly prevalent, especially in healthcare where explainable models are vital for ethical and trusted automated decision making. Work on the susceptibility of deep learning models to adversarial attacks has shown the ease of designing samples to mislead a model into making incorrect predictions. In this work, we propose a model agnostic explainability-based method for the accurate detection of adversarial samples on two datasets with different complexity and properties: Electronic Health Record (EHR) and chest X-ray (CXR) data. On the MIMIC-III and Henan-Renmin EHR datasets, we report a detection accuracy of 77% against the Longitudinal Adversarial Attack. On the MIMIC-CXR dataset, we achieve an accuracy of 88%; significantly improving on the state of the art of adversarial detection in both datasets by over 10% in all settings. We propose an anomaly detection based method using explainability techniques to detect adversarial samples which is able to generalise to different attack methods without a need for retraining.

可解释的机器学习已经变得越来越普遍，特别是在医疗保健领域，可解释的模型对于道德和可信的自动决策至关重要。关于深度学习模型对对抗性攻击的敏感性的研究表明，设计样本来误导模型做出错误的预测是很容易的。在这项工作中，我们提出了一种基于模型不可知的可解释性的方法，用于在两个具有不同复杂性和属性的数据集上准确检测对抗性样本:电子健康记录(EHR)和胸部x射线(CXR)数据。在MIMIC-III和河南-人民电子病历数据集上，我们报告了纵向对抗性攻击的检测准确率为77%。在MIMIC-CXR数据集上，我们实现了88%的准确率;在所有设置下，这两个数据集的对抗性检测技术都显著提高了10%以上。我们提出了一种基于异常检测的方法，使用可解释性技术来检测对抗性样本，该方法能够推广到不同的攻击方法，而无需再训练。

引用次数: 10

Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning 自监督视觉表征学习的视听预测编码

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413295

M. Tellamekala, M. Valstar, Michael P. Pound, T. Giesbrecht

Self-supervised learning has emerged as a candidate approach to learn semantic visual features from unlabeled video data. In self-supervised learning, intrinsic correspondences between data points are used to define a proxy task that forces the model to learn semantic representations. Most existing proxy tasks applied to video data exploit only either intra-modal (e.g. temporal) or cross-modal (e.g. audio-visual) correspondences separately. In theory, jointly learning both these correspondences may result in richer visual features; but, as we show in this work, doing so is non-trivial in practice. To address this problem, we introduce ‘Audio-Visual Permutative Predictive Coding’ (AV-PPC), a multi-task learning framework designed to fully leverage the temporal and cross-modal correspondences as natural supervision signals. In AV-PPC, the model is trained to simultaneously learn multiple intra- and cross-modal predictive coding sub-tasks. By using visual speech recognition (lip-reading) as the downstream evaluation task, we show that our proposed proxy task can learn higher quality visual features than existing proxy tasks. We also show that AV-PPC visual features are highly data-efficient. Without further finetuning, AV-PPC visual encoder achieves 80.30% spoken word classification rate on the LRW dataset, performing on par with directly supervised visual encoders that are learned from large amounts of labeled data.

自监督学习已经成为一种从未标记视频数据中学习语义视觉特征的候选方法。在自监督学习中，数据点之间的内在对应关系用于定义代理任务，该任务强制模型学习语义表示。应用于视频数据的大多数现有代理任务仅单独利用模态内(例如时间)或跨模态(例如视听)对应。从理论上讲，共同学习这两种对应关系可能会产生更丰富的视觉特征;但是，正如我们在这项工作中所展示的，这样做在实践中不是微不足道的。为了解决这个问题,我们引入“视听Permutative预测编码”(AV-PPC),一个多任务学习框架,旨在充分利用时间和跨通道通讯监督信号一样自然。在AV-PPC中，训练模型同时学习多个模态内和跨模态的预测编码子任务。通过使用视觉语音识别(唇读)作为下游评估任务，我们表明我们提出的代理任务比现有的代理任务可以学习到更高质量的视觉特征。我们还表明，AV-PPC视觉特征具有很高的数据效率。在没有进一步调整的情况下，AV-PPC视觉编码器在LRW数据集上实现了80.30%的口语单词分类率，与从大量标记数据中学习的直接监督视觉编码器的表现相当。

{"title":"Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning","authors":"M. Tellamekala, M. Valstar, Michael P. Pound, T. Giesbrecht","doi":"10.1109/ICPR48806.2021.9413295","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413295","url":null,"abstract":"Self-supervised learning has emerged as a candidate approach to learn semantic visual features from unlabeled video data. In self-supervised learning, intrinsic correspondences between data points are used to define a proxy task that forces the model to learn semantic representations. Most existing proxy tasks applied to video data exploit only either intra-modal (e.g. temporal) or cross-modal (e.g. audio-visual) correspondences separately. In theory, jointly learning both these correspondences may result in richer visual features; but, as we show in this work, doing so is non-trivial in practice. To address this problem, we introduce ‘Audio-Visual Permutative Predictive Coding’ (AV-PPC), a multi-task learning framework designed to fully leverage the temporal and cross-modal correspondences as natural supervision signals. In AV-PPC, the model is trained to simultaneously learn multiple intra- and cross-modal predictive coding sub-tasks. By using visual speech recognition (lip-reading) as the downstream evaluation task, we show that our proposed proxy task can learn higher quality visual features than existing proxy tasks. We also show that AV-PPC visual features are highly data-efficient. Without further finetuning, AV-PPC visual encoder achieves 80.30% spoken word classification rate on the LRW dataset, performing on par with directly supervised visual encoders that are learned from large amounts of labeled data.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"89 1","pages":"9912-9919"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85910649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Robust Localization of Retinal Lesions via Weakly-supervised Learning 基于弱监督学习的视网膜病变鲁棒定位

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413100

Ruohan Zhao, Qin Li, J. You

Retinal fundus images reveal the condition of retina, blood vessels and optic nerve, and is becoming widely adopted in clinical work because any subtle changes to the structures at the back of the eyes can affect the eyes and indicate the overall health. Recently, machine learning, in particular deep learning by convolutional neural network (CNN), has been increasingly adopted for computer-aided detection (CAD) of retinal lesions. However, a significant barrier to the high performance of CNN based CAD approach is the lack of sufficient labeled image samples for training. Unlike the fully-supervised learning which relies on pixel-level annotation of pathology in fundus images, this paper presents a new approach to discriminate the location of various lesions based on image-level labels via weakly learning. More specifically, our proposed method leverages the multilevel feature maps and classification score to cope with both bright and red lesions in fundus images. To enhance capability of learning less discriminative parts of objects (e.g. small blobs of microaneurysms opposed to bulk of exudates), the classifier is regularized by refining images with corresponding labels. The experimental results of the performance evaluation and benchmarking at both image-level and pixel-level on the public DIARETDB1 dataset demonstrate the feasibility and excellent potentials of our method in practical usage.

视网膜眼底图像显示视网膜、血管和视神经的状况，由于眼后结构的任何细微变化都会影响眼睛并表明整体健康状况，因此在临床工作中被广泛采用。近年来，机器学习，特别是卷积神经网络(CNN)的深度学习，越来越多地被用于视网膜病变的计算机辅助检测(CAD)。然而，基于CNN的CAD方法的高性能的一个重要障碍是缺乏足够的标记图像样本用于训练。与全监督学习依赖于眼底图像病理的像素级标注不同，本文提出了一种基于图像级标记的弱学习方法来区分各种病灶的位置。更具体地说，我们提出的方法利用多层特征图和分类评分来处理眼底图像中的亮病灶和红色病灶。为了增强学习对象中鉴别性较差部分的能力(例如，相对于大量渗出物而言，小块的微动脉瘤)，分类器通过使用相应的标签对图像进行细化来进行正则化。在DIARETDB1公共数据集上的图像级和像素级性能评估和基准测试的实验结果证明了我们的方法在实际应用中的可行性和良好的潜力。

{"title":"Robust Localization of Retinal Lesions via Weakly-supervised Learning","authors":"Ruohan Zhao, Qin Li, J. You","doi":"10.1109/ICPR48806.2021.9413100","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413100","url":null,"abstract":"Retinal fundus images reveal the condition of retina, blood vessels and optic nerve, and is becoming widely adopted in clinical work because any subtle changes to the structures at the back of the eyes can affect the eyes and indicate the overall health. Recently, machine learning, in particular deep learning by convolutional neural network (CNN), has been increasingly adopted for computer-aided detection (CAD) of retinal lesions. However, a significant barrier to the high performance of CNN based CAD approach is the lack of sufficient labeled image samples for training. Unlike the fully-supervised learning which relies on pixel-level annotation of pathology in fundus images, this paper presents a new approach to discriminate the location of various lesions based on image-level labels via weakly learning. More specifically, our proposed method leverages the multilevel feature maps and classification score to cope with both bright and red lesions in fundus images. To enhance capability of learning less discriminative parts of objects (e.g. small blobs of microaneurysms opposed to bulk of exudates), the classifier is regularized by refining images with corresponding labels. The experimental results of the performance evaluation and benchmarking at both image-level and pixel-level on the public DIARETDB1 dataset demonstrate the feasibility and excellent potentials of our method in practical usage.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"7 1","pages":"4613-4618"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78355413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Online Subclass Knowledge Distillation for Image Classification 面向图像分类的高效在线子类知识蒸馏

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9411995

Maria Tzelepi, N. Passalis, A. Tefas

Deploying state-of-the-art deep learning models on embedded systems dictates certain storage and computation limitations. During the recent few years Knowledge Distillation (KD) has been recognized as a prominent approach to address this issue. That is, KD has been effectively proposed for training fast and compact deep learning models by transferring knowledge from more complex and powerful models. However, knowledge distillation, in its conventional form, involves multiple stages of training, rendering it a computationally and memory demanding procedure. In this paper, a novel single-stage self knowledge distillation method is proposed, namely Online Subclass Knowledge Distillation (OSKD), that aims at revealing the similarities inside classes, so as to improve the performance of any deep neural model in an online manner. Hence, as opposed to existing online distillation methods, we are able to acquire further knowledge from the model itself, without building multiple identical models or using multiple models to teach each other, rendering the proposed OSKD approach more efficient. The experimental evaluation on two datasets validates that the proposed method improves the classification performance.

在嵌入式系统上部署最先进的深度学习模型有一定的存储和计算限制。近年来，知识蒸馏(Knowledge Distillation, KD)被认为是解决这一问题的重要方法。也就是说，KD通过从更复杂和强大的模型中转移知识，有效地用于训练快速和紧凑的深度学习模型。然而，传统形式的知识蒸馏涉及多个训练阶段，使其成为一个计算和内存要求很高的过程。本文提出了一种新的单阶段自知识蒸馏方法——在线子类知识蒸馏(Online Subclass knowledge distillation, OSKD)，该方法旨在揭示类内部的相似性，从而在线地提高任何深度神经模型的性能。因此，与现有的在线蒸馏方法相反，我们能够从模型本身获得进一步的知识，而无需构建多个相同的模型或使用多个模型相互学习，从而使所提出的OSKD方法更有效。在两个数据集上的实验评估验证了该方法提高了分类性能。

{"title":"Efficient Online Subclass Knowledge Distillation for Image Classification","authors":"Maria Tzelepi, N. Passalis, A. Tefas","doi":"10.1109/ICPR48806.2021.9411995","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9411995","url":null,"abstract":"Deploying state-of-the-art deep learning models on embedded systems dictates certain storage and computation limitations. During the recent few years Knowledge Distillation (KD) has been recognized as a prominent approach to address this issue. That is, KD has been effectively proposed for training fast and compact deep learning models by transferring knowledge from more complex and powerful models. However, knowledge distillation, in its conventional form, involves multiple stages of training, rendering it a computationally and memory demanding procedure. In this paper, a novel single-stage self knowledge distillation method is proposed, namely Online Subclass Knowledge Distillation (OSKD), that aims at revealing the similarities inside classes, so as to improve the performance of any deep neural model in an online manner. Hence, as opposed to existing online distillation methods, we are able to acquire further knowledge from the model itself, without building multiple identical models or using multiple models to teach each other, rendering the proposed OSKD approach more efficient. The experimental evaluation on two datasets validates that the proposed method improves the classification performance.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"70 1","pages":"1007-1014"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72933953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Object-oriented Map Exploration and Construction Based on Auxiliary Task Aided DRL 基于辅助任务辅助DRL的面向对象地图勘探与构建

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412299

Junzhe Xu, Jianhua Zhang, Shengyong Chen, Honghai Liu

Environment exploration by autonomous robots through deep reinforcement learning (DRL) based methods has attracted more and more attention. However, existing methods usually focus on robot navigation to single or multiple fixed goals, while ignoring the perception and construction of external environments. In this paper, we propose a novel environment exploration task based on DRL, which requires a robot fast and completely perceives all objects of interest, and reconstructs their poses in a global environment map, as much as the robot can do. To this end, we design an auxiliary task aided DRL model, which is integrated with the auxiliary object detection and 6-DoF pose estimation components. The outcome of auxiliary tasks can improve the learning speed and robustness of DRL, as well as the accuracy of object pose estimation. Comprehensive experimental results on the indoor simulation platform AI2-THOR have shown the effectiveness and robustness of our method.

基于深度强化学习(DRL)的自主机器人环境探索方法越来越受到人们的关注。然而，现有的方法通常侧重于机器人导航到单个或多个固定目标，而忽略了外部环境的感知和构建。在本文中，我们提出了一种基于DRL的新型环境探索任务，该任务要求机器人能够快速完整地感知所有感兴趣的物体，并在全局环境地图中尽可能多地重建它们的姿态。为此，我们设计了一种辅助任务辅助DRL模型，该模型集成了辅助目标检测和六自由度姿态估计组件。辅助任务的结果可以提高DRL的学习速度和鲁棒性，以及目标姿态估计的准确性。在室内仿真平台AI2-THOR上的综合实验结果表明了该方法的有效性和鲁棒性。

引用次数: 0

Open-World Group Retrieval with Ambiguity Removal: A Benchmark 基于歧义去除的开放世界组检索:一个基准

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412734

Ling Mei, J. Lai, Zhanxiang Feng, Xiaohua Xie

Group retrieval has attracted plenty of attention in artificial intelligence, traditional group retrieval researches assume that members in a group are unique and do not change under different cameras. However, the assumption may not be met for practical situations such as open-world and group-ambiguity scenarios. This paper tackles an important yet non-studied problem: re-identifying changing groups of people under the open-world and group-ambiguity scenarios in different camera fields. The open-world scenario considers that there are probably non-target people for the probe set appear in the searching gallery, while the group-ambiguity scenario means the group members may change. The open-world and group-ambiguity issue is very challenging for the existing methods because the changing of group members results in dramatic visual variations. Nevertheless, as far as we know, the existing literature lacks benchmarks which target on coping with this issue. In this paper, we propose a new group retrieval dataset named OWGA-Campus to consider these challenges. Moreover, we propose a person-to-group similarity matching based ambiguity removal (P2GSM-AR) method to solve these problems and realize the intention of group retrieval. Experimental results on OWGA-Campus dataset demonstrate the effectiveness and robustness of the proposed P2GSM-AR approach in improving the performance of the state-of-the-art feature extraction methods of person re-id towards the open-world and ambiguous group retrieval task.

群体检索在人工智能领域受到广泛关注，传统的群体检索研究假设群体中的成员是唯一的，在不同的摄像机下不会发生变化。然而，对于开放世界和群体模糊场景等实际情况，该假设可能不满足。本文解决了一个重要但尚未被研究的问题:在开放世界和群体模糊场景下，重新识别不同摄像场中不断变化的人群。开放世界场景考虑到搜索库中可能会出现探测集的非目标人，而群体模糊场景则意味着群体成员可能会发生变化。开放世界和群体模糊问题对现有的方法来说是非常具有挑战性的，因为群体成员的变化会导致巨大的视觉变化。然而，据我们所知，现有文献缺乏针对应对这一问题的基准。在本文中，我们提出了一个新的组检索数据集OWGA-Campus来考虑这些挑战。在此基础上，提出了一种基于人对群体相似性匹配的模糊去除(P2GSM-AR)方法来解决这些问题，实现群体检索的目的。在OWGA-Campus数据集上的实验结果表明，本文提出的P2GSM-AR方法能够有效地改善目前最先进的人物身份特征提取方法在开放世界和模糊群体检索任务中的性能。

{"title":"Open-World Group Retrieval with Ambiguity Removal: A Benchmark","authors":"Ling Mei, J. Lai, Zhanxiang Feng, Xiaohua Xie","doi":"10.1109/ICPR48806.2021.9412734","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412734","url":null,"abstract":"Group retrieval has attracted plenty of attention in artificial intelligence, traditional group retrieval researches assume that members in a group are unique and do not change under different cameras. However, the assumption may not be met for practical situations such as open-world and group-ambiguity scenarios. This paper tackles an important yet non-studied problem: re-identifying changing groups of people under the open-world and group-ambiguity scenarios in different camera fields. The open-world scenario considers that there are probably non-target people for the probe set appear in the searching gallery, while the group-ambiguity scenario means the group members may change. The open-world and group-ambiguity issue is very challenging for the existing methods because the changing of group members results in dramatic visual variations. Nevertheless, as far as we know, the existing literature lacks benchmarks which target on coping with this issue. In this paper, we propose a new group retrieval dataset named OWGA-Campus to consider these challenges. Moreover, we propose a person-to-group similarity matching based ambiguity removal (P2GSM-AR) method to solve these problems and realize the intention of group retrieval. Experimental results on OWGA-Campus dataset demonstrate the effectiveness and robustness of the proposed P2GSM-AR approach in improving the performance of the state-of-the-art feature extraction methods of person re-id towards the open-world and ambiguous group retrieval task.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"1 1","pages":"584-591"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79829810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Improved Deep Classwise Hashing With Centers Similarity Learning for Image Retrieval 基于中心相似学习的图像检索改进深度分类哈希

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412086

Ming Zhang, Hong Yan

Deep supervised hashing for image retrieval has attracted researchers' attention due to its high efficiency and superior retrieval performance. Most existing deep supervised hashing works, which are based on pairwise/triplet labels, suffer from the expensive computational cost and insufficient utilization of the semantics information. Recently, deep classwise hashing introduced a classwise loss supervised by class labels information alternatively; however, we find it still has its drawback. In this paper, we propose an improved deep classwise hashing, which enables hashing learning and class centers learning simultaneously. Specifically, we design a two-step strategy on center similarity learning. It interacts with the classwise loss to attract the class center to concentrate on the intra-class samples while pushing other class centers as far as possible. The centers similarity learning contributes to generating more compact and discriminative hashing codes. We conduct experiments on three benchmark datasets. It shows that the proposed method effectively surpasses the original method and outperforms state-of-the-art baselines under various commonly-used evaluation metrics for image retrieval.

基于深度监督哈希的图像检索以其高效率和优越的检索性能而受到研究人员的关注。现有的深度监督哈希算法大多基于成对/三元组标签，存在计算成本高、语义信息利用不足等问题。最近，深度分类哈希引入了一种由类标签信息监督的分类损失;然而，我们发现它仍然有它的缺点。在本文中，我们提出了一种改进的深度分类哈希，使哈希学习和类中心学习同时进行。具体来说，我们设计了一个两步的中心相似学习策略。它与类损失相互作用，吸引类中心集中在类内样本上，同时将其他类中心推到尽可能远的地方。中心的相似性学习有助于生成更紧凑和判别的哈希码。我们在三个基准数据集上进行了实验。结果表明，在各种常用的图像检索评价指标下，本文提出的方法有效地超越了原始方法，并优于最先进的基线。

{"title":"Improved Deep Classwise Hashing With Centers Similarity Learning for Image Retrieval","authors":"Ming Zhang, Hong Yan","doi":"10.1109/ICPR48806.2021.9412086","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412086","url":null,"abstract":"Deep supervised hashing for image retrieval has attracted researchers' attention due to its high efficiency and superior retrieval performance. Most existing deep supervised hashing works, which are based on pairwise/triplet labels, suffer from the expensive computational cost and insufficient utilization of the semantics information. Recently, deep classwise hashing introduced a classwise loss supervised by class labels information alternatively; however, we find it still has its drawback. In this paper, we propose an improved deep classwise hashing, which enables hashing learning and class centers learning simultaneously. Specifically, we design a two-step strategy on center similarity learning. It interacts with the classwise loss to attract the class center to concentrate on the intra-class samples while pushing other class centers as far as possible. The centers similarity learning contributes to generating more compact and discriminative hashing codes. We conduct experiments on three benchmark datasets. It shows that the proposed method effectively surpasses the original method and outperforms state-of-the-art baselines under various commonly-used evaluation metrics for image retrieval.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"4 1","pages":"10516-10523"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79962276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Single Image Super-Resolution with Dynamic Residual Connection 单图像超分辨率与动态残差连接

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413244

Karam Park, Jae Woong Soh, N. Cho

Deep convolutional neural networks have shown significant improvement in the single image super-resolution (SISR) field. Recently, there have been attempts to solve the SISR problem using lightweight networks, considering limited computational resources for real-world applications. Especially for lightweight networks, balancing between parameter demand and performance is very difficult to adjust, and most lightweight SISR networks are manually designed based on a huge number of brute-force experiments. Besides, a critical key to the network performance relies on the skip connection of building blocks that are repeatedly in the architecture. Notably, in previous works, these connections are pre-defined and manually determined by human researchers. Hence, they are less flexible to the input image statistics, and there can be a better solution for the given number of parameters. Therefore, we focus on the automated design of networks regarding the connection of basic building blocks (residual networks), and as a result, propose a dynamic residual attention network (DRAN). The proposed method allows the network to dynamically select residual paths depending on the input image, based on the idea of attention mechanism. For this, we design a dynamic residual module that determines the residual paths between the basic building blocks for the given input image. By finding optimal residual paths between the blocks, the network can selectively bypass informative features needed to reconstruct the target high-resolution (HR) image. Experimental results show that our proposed DRAN outperforms most of the existing state-of-the-arts lightweight models in SISR.

深度卷积神经网络在单幅图像超分辨率(SISR)领域显示出显著的进步。最近，考虑到实际应用的计算资源有限，已经有人尝试使用轻量级网络来解决SISR问题。特别是对于轻量级网络，参数需求和性能之间的平衡很难调整，大多数轻量级的SISR网络都是基于大量的暴力破解实验手工设计的。此外，网络性能的一个关键因素依赖于结构中反复出现的构建块的跳过连接。值得注意的是，在以前的工作中，这些联系是预先定义的，由人类研究人员手动确定。因此，它们对输入图像统计数据的灵活性较差，对于给定的参数数量可以有更好的解决方案。因此，我们将重点放在基本构建块(残差网络)连接的网络自动化设计上，并提出了动态残差关注网络(DRAN)。该方法基于注意机制的思想，允许网络根据输入图像动态选择残差路径。为此，我们设计了一个动态残差模块，用于确定给定输入图像的基本构建块之间的残差路径。通过寻找块之间的最优残差路径，网络可以选择性地绕过重建目标高分辨率(HR)图像所需的信息特征。实验结果表明，我们提出的DRAN在SISR中优于大多数现有的最先进的轻量化模型。

{"title":"Single Image Super-Resolution with Dynamic Residual Connection","authors":"Karam Park, Jae Woong Soh, N. Cho","doi":"10.1109/ICPR48806.2021.9413244","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413244","url":null,"abstract":"Deep convolutional neural networks have shown significant improvement in the single image super-resolution (SISR) field. Recently, there have been attempts to solve the SISR problem using lightweight networks, considering limited computational resources for real-world applications. Especially for lightweight networks, balancing between parameter demand and performance is very difficult to adjust, and most lightweight SISR networks are manually designed based on a huge number of brute-force experiments. Besides, a critical key to the network performance relies on the skip connection of building blocks that are repeatedly in the architecture. Notably, in previous works, these connections are pre-defined and manually determined by human researchers. Hence, they are less flexible to the input image statistics, and there can be a better solution for the given number of parameters. Therefore, we focus on the automated design of networks regarding the connection of basic building blocks (residual networks), and as a result, propose a dynamic residual attention network (DRAN). The proposed method allows the network to dynamically select residual paths depending on the input image, based on the idea of attention mechanism. For this, we design a dynamic residual module that determines the residual paths between the basic building blocks for the given input image. By finding optimal residual paths between the blocks, the network can selectively bypass informative features needed to reconstruct the target high-resolution (HR) image. Experimental results show that our proposed DRAN outperforms most of the existing state-of-the-arts lightweight models in SISR.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"67 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76711010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Fractional Adaptation of Activation Functions In Neural Networks 神经网络中激活函数的分数自适应

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413338

J. Zamora-Esquivel, Jesus Adan Cruz Vargas, P. López-Meyer

In this work, we introduce a generalization methodology for the automatic selection of the activation functions inside a neural network, taking advantage of concepts defined in fractional calculus. This methodology enables the neural network to search and optimize its own activation functions during the training process, by defining the fractional order of the derivative of a given primitive activation function. This fractional order is tuned as an additional training hyper-parameter $a$ for intrafamily selection and $b$ for cross family selection. By following this approach, the neurons inside the network can adjust their activation functions, e.g. from MLP to RBF networks, to best fit the input data, and reduce the output error. The experimental results obtained show the benefits of using this technique implemented on a ResNet18 topology, by outperforming the accuracy of a ResNet100 trained with CIFAR10 and Improving 1% ImageNet reported in the literature.

在这项工作中，我们引入了一种泛化方法，用于自动选择神经网络内的激活函数，利用分数阶微积分中定义的概念。该方法通过定义给定原始激活函数导数的分数阶，使神经网络能够在训练过程中搜索和优化自己的激活函数。这个分数阶作为额外的训练超参数$a$用于家族内选择，$b$用于跨家族选择。通过遵循这种方法，网络内部的神经元可以调整它们的激活函数，例如从MLP到RBF网络，以最佳地拟合输入数据，并减少输出误差。实验结果表明，在ResNet18拓扑上实现该技术的好处是优于用CIFAR10训练的ResNet100的精度，并提高了文献中报道的1%的ImageNet。

引用次数: 1

Weight Estimation from an RGB-D camera in top-view configuration 从俯视图配置的RGB-D摄像机估计重量

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412519

M. Mameli, M. Paolanti, N. Conci, Filippo Tessaro, E. Frontoni, P. Zingaretti

The development of so-called soft-biometrics aims at providing information related to the physical and behavioural characteristics of a person. This paper focuses on body weight estimation based on the observation from a top-view RGB-D camera. In fact, the capability to estimate the weight of a person can be of help in many different applications, from health-related scenarios, to business intelligence and retail analytics. To deal with this issue, a TVWE (Top-View Weight Estimation) framework is proposed with the aim of predicting the weight. The approach relies on the adoption of Deep Neural Networks (DNNs) that have been trained on depth data. Each network has also been modified in their top section to replace classification with prediction inference. The performance of five state-of-art DNNs have been compared, namely VGG16, ResNet, Inception, DenseNet and Efficient-Net. In addition, a convolutional auto-encoder has also been included for completeness. Considering the limited literature in this domain, the TVWE framework has been evaluated on a new publicly available dataset: “VRAI Weight estimation Dataset”, which also collects, for each subject, labels related to weight, gender, and height. The experimental results have demonstrated that the proposed methods are suitable for this task, bringing different and significant insights for the application of the solution in different domains.

所谓的软生物识别技术的发展旨在提供与人的身体和行为特征有关的信息。本文主要研究基于俯视图RGB-D相机观测的体重估计。事实上，估计一个人的体重的能力可以在许多不同的应用程序中提供帮助，从与健康相关的场景到商业智能和零售分析。为了解决这一问题，提出了一个以预测权重为目标的TVWE (Top-View Weight Estimation)框架。该方法依赖于采用深度数据训练过的深度神经网络(dnn)。每个网络也在其顶部部分进行了修改，以预测推理取代分类。本文比较了VGG16、ResNet、Inception、DenseNet和Efficient-Net这五种最先进的深度神经网络的性能。此外，为了完整起见，还包括了一个卷积自编码器。考虑到该领域的文献有限，TVWE框架已经在一个新的公开可用的数据集上进行了评估:“VRAI体重估计数据集”，该数据集还收集了每个受试者与体重、性别和身高相关的标签。实验结果表明，所提出的方法适用于该任务，为解决方案在不同领域的应用带来了不同的和重要的见解。

{"title":"Weight Estimation from an RGB-D camera in top-view configuration","authors":"M. Mameli, M. Paolanti, N. Conci, Filippo Tessaro, E. Frontoni, P. Zingaretti","doi":"10.1109/ICPR48806.2021.9412519","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412519","url":null,"abstract":"The development of so-called soft-biometrics aims at providing information related to the physical and behavioural characteristics of a person. This paper focuses on body weight estimation based on the observation from a top-view RGB-D camera. In fact, the capability to estimate the weight of a person can be of help in many different applications, from health-related scenarios, to business intelligence and retail analytics. To deal with this issue, a TVWE (Top-View Weight Estimation) framework is proposed with the aim of predicting the weight. The approach relies on the adoption of Deep Neural Networks (DNNs) that have been trained on depth data. Each network has also been modified in their top section to replace classification with prediction inference. The performance of five state-of-art DNNs have been compared, namely VGG16, ResNet, Inception, DenseNet and Efficient-Net. In addition, a convolutional auto-encoder has also been included for completeness. Considering the limited literature in this domain, the TVWE framework has been evaluated on a new publicly available dataset: “VRAI Weight estimation Dataset”, which also collects, for each subject, labels related to weight, gender, and height. The experimental results have demonstrated that the proposed methods are suitable for this task, bringing different and significant insights for the application of the solution in different domains.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"49 1","pages":"7715-7722"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77115749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1