2020 25th International Conference on Pattern Recognition (ICPR)最新文献

英文中文

Auto Encoding Explanatory Examples with Stochastic Paths 随机路径的自动编码解释性示例

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413267

C. Ojeda, Ramsés J. Sánchez, K. Cvejoski, J. Schücker, C. Bauckhage, B. Georgiev

In this paper we ask for the main factors that determine a classifier’s decision making process and uncover such factors by studying latent codes produced by auto-encoding frameworks. To deliver an explanation of a classifier’s behaviour, we propose a method that provides series of examples highlighting semantic differences between the classifier’s decisions. These examples are generated through interpolations in latent space. We introduce and formalize the notion of a semantic stochastic path, as a suitable stochastic process defined in feature (data) space via latent code interpolations. We then introduce the concept of semantic Lagrangians as a way to incorporate the desired classifier’s behaviour and find that the solution of the associated variational problem allows for highlighting differences in the classifier decision. Very importantly, within our framework the classifier is used as a black-box, and only its evaluation is required.

在本文中，我们提出了决定分类器决策过程的主要因素，并通过研究自动编码框架产生的潜在代码来揭示这些因素。为了解释分类器的行为，我们提出了一种方法，该方法提供了一系列突出分类器决策之间语义差异的示例。这些例子是通过隐空间插值生成的。我们引入并形式化了语义随机路径的概念，作为一个通过潜在代码插值在特征(数据)空间中定义的合适的随机过程。然后，我们引入语义拉格朗日量的概念，作为整合所需分类器行为的一种方式，并发现相关变分问题的解决方案允许突出分类器决策中的差异。非常重要的是，在我们的框架中，分类器被用作黑盒，只需要对其进行评估。

引用次数: 0

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows 基于稀疏光流的第一人称视角相机活动识别

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412330

Peng Yua Kao, Yan-Jing Lei, Chia-Hao Chang, Chu-Song Chen, Ming-Sui Lee, Y. Hung

First-person-view (FPV) cameras are finding wide use in daily life to record activities and sports. In this paper, we propose a succinct and robust 3D convolutional neural network (CNN) architecture accompanied with an ensemble-learning network for activity recognition with FPV videos. The proposed 3D CNN is trained on low-resolution (32 × 32) sparse optical flows using FPV video datasets consisting of daily activities. According to the experimental results, our network achieves an average accuracy of 90%.

第一人称视角(FPV)摄像机在日常生活中被广泛用于记录活动和运动。在本文中，我们提出了一种简洁且鲁棒的3D卷积神经网络(CNN)架构以及用于FPV视频活动识别的集成学习网络。本文提出的3D CNN是在低分辨率(32 × 32)稀疏光流上训练的，使用的是由日常活动组成的FPV视频数据集。根据实验结果，我们的网络达到了90%的平均准确率。

引用次数: 2

GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Semantic Segmentation 语义分割中多尺度特征学习的门控尺度转移操作

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412965

Zhuoying Wang, Yongtao Wang, Zhi Tang, Yangyan Li, Ying Chen, Haibin Ling, Weisi Lin

Existing CNN-based methods for semantic segmentation heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art segmentation networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work, we find that these operations lead to scale-confused features and suboptimal performance because they are spatial-invariant and directly transit all feature information cross scales without spatial selection. To address this issue, we propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features to another scale. Specifically, GSTO can work either with or without extra supervision. Unsupervised GSTO is learned from the feature itself while the supervised one is guided by the supervised probability matrix. Both forms of GSTO are lightweight and plug-and-play, which can be flexibly integrated into networks or modules for learning better multi-scale features. In particular, by plugging GSTO into HRNet, we get a more powerful backbone (namely GSTO-HRNet) for pixel labeling, and it achieves new state-of-the-art results on multiple benchmarks for semantic segmentation including Cityscapes, LIP, and Pascal Context, with a negligible extra computational cost. Moreover, experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules like PPM and ASPP.

现有的基于cnn的语义分割方法严重依赖于多尺度特征来满足语义理解和细节保留的要求。最先进的分割网络广泛利用传统的尺度转移操作，即上采样和下采样来学习多尺度特征。在这项工作中，我们发现这些操作会导致尺度混淆特征和次优性能，因为它们是空间不变的，并且直接跨尺度传输所有特征信息而没有空间选择。为了解决这个问题，我们提出了门控尺度转移操作(GSTO)，以适当地将空间滤波特征传输到另一个尺度。具体来说，GSTO可以在有或没有额外监督的情况下工作。无监督GSTO从特征本身学习，有监督GSTO由有监督概率矩阵指导。两种形式的GSTO都是轻量级和即插即用的，可以灵活地集成到网络或模块中，以便更好地学习多尺度特征。特别是，通过将GSTO插入HRNet，我们得到了一个更强大的骨干(即GSTO-HRNet)用于像素标记，并且它在多个语义分割基准(包括cityscape, LIP和Pascal Context)上获得了新的最先进的结果，而额外的计算成本可以忽略不计。此外，实验结果表明，GSTO还可以显著提高PPM和ASPP等多尺度特征聚合模块的性能。

{"title":"GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Semantic Segmentation","authors":"Zhuoying Wang, Yongtao Wang, Zhi Tang, Yangyan Li, Ying Chen, Haibin Ling, Weisi Lin","doi":"10.1109/ICPR48806.2021.9412965","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412965","url":null,"abstract":"Existing CNN-based methods for semantic segmentation heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art segmentation networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work, we find that these operations lead to scale-confused features and suboptimal performance because they are spatial-invariant and directly transit all feature information cross scales without spatial selection. To address this issue, we propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features to another scale. Specifically, GSTO can work either with or without extra supervision. Unsupervised GSTO is learned from the feature itself while the supervised one is guided by the supervised probability matrix. Both forms of GSTO are lightweight and plug-and-play, which can be flexibly integrated into networks or modules for learning better multi-scale features. In particular, by plugging GSTO into HRNet, we get a more powerful backbone (namely GSTO-HRNet) for pixel labeling, and it achieves new state-of-the-art results on multiple benchmarks for semantic segmentation including Cityscapes, LIP, and Pascal Context, with a negligible extra computational cost. Moreover, experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules like PPM and ASPP.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"5 1","pages":"7111-7118"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79138369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Detection of Stationary Waves in the Venus Atmosphere Using Deep Generative Models 利用深度生成模型自动探测金星大气中的驻波

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413038

Minori Narita, Daiki Kimura, T. Imamura

Various anomaly detection methods utilizing different types of images have recently been proposed. However, anomaly detection in the field of planetary science is still done predominantly by the human eye because explainability is crucial in the physical sciences and most of today's anomaly detection methods based on deep learning cannot offer enough. Moreover, preparing a large number of images required for fully utilizing anomaly detection is not always feasible. In this work, we propose a new framework that automatically detects large bow-shaped structures (stationary waves) appearing on the surface of the Venus clouds by applying a variational auto-encoder (VAE) and attention maps to anomaly detection. We also discuss the advantages of using image augmentation. Experiments show that our approach can achieve higher accuracy than the state-of-the-art methods even when the anomaly images are scarce. On the basis of this finding, we discuss anomaly detection frameworks particularly suited to physical science domains.

近年来，人们提出了利用不同类型图像的各种异常检测方法。然而，行星科学领域的异常检测仍然主要由人眼完成，因为可解释性在物理科学中至关重要，而当今大多数基于深度学习的异常检测方法都无法提供足够的服务。此外，为充分利用异常检测而准备大量图像并不总是可行的。在这项工作中，我们提出了一个新的框架，通过应用变分自编码器(VAE)和注意图进行异常检测，自动检测出现在金星云表面的大型弓形结构(静止波)。我们还讨论了使用图像增强的优点。实验表明，即使在异常图像较少的情况下，我们的方法也能达到比现有方法更高的精度。在此发现的基础上，我们讨论了特别适合物理科学领域的异常检测框架。

引用次数: 0

Generic Merging of Structure from Motion Maps with a Low Memory Footprint 低内存占用运动地图结构的通用合并

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412879

Gabrielle Flood, David Gillsjö, Patrik Persson, A. Heyden, K. Åström

With the development of cheap image sensors, the amount of available image data have increased enormously, and the possibility of using crowdsourced collection methods has emerged. This calls for development of ways to handle all these data. In this paper, we present new tools that will enable efficient, flexible and robust map merging. Assuming that separate optimisations have been performed for the individual maps, we show how only relevant data can be stored in a low memory footprint representation. We use these representations to perform map merging so that the algorithm is invariant to the merging order and independent of the choice of coordinate system. The result is a robust algorithm that can be applied to several maps simultaneously. The result of a merge can also be represented with the same type of low-memory footprint format, which enables further merging and updating of the map in a hierarchical way. Furthermore, the method can perform loop closing and also detect changes in the scene between the capture of the different image sequences. Using both simulated and real data — from both a hand held mobile phone and from a drone — we verify the performance of the proposed method.

随着廉价图像传感器的发展，可用图像数据的数量大大增加，使用众包收集方法的可能性已经出现。这就要求开发处理所有这些数据的方法。在本文中，我们提出了新的工具，将实现高效，灵活和稳健的地图合并。假设对单个映射执行了单独的优化，我们将展示如何仅将相关数据存储在低内存占用表示中。我们使用这些表示来进行地图合并，使算法与合并顺序不变，并且与坐标系的选择无关。结果是一种可以同时应用于多个地图的鲁棒算法。合并的结果也可以用相同类型的低内存占用格式表示，这支持以分层方式进一步合并和更新映射。此外，该方法可以执行闭环，还可以检测不同图像序列捕获之间的场景变化。使用模拟和真实数据-来自手持移动电话和无人机-我们验证了所提出方法的性能。

{"title":"Generic Merging of Structure from Motion Maps with a Low Memory Footprint","authors":"Gabrielle Flood, David Gillsjö, Patrik Persson, A. Heyden, K. Åström","doi":"10.1109/ICPR48806.2021.9412879","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412879","url":null,"abstract":"With the development of cheap image sensors, the amount of available image data have increased enormously, and the possibility of using crowdsourced collection methods has emerged. This calls for development of ways to handle all these data. In this paper, we present new tools that will enable efficient, flexible and robust map merging. Assuming that separate optimisations have been performed for the individual maps, we show how only relevant data can be stored in a low memory footprint representation. We use these representations to perform map merging so that the algorithm is invariant to the merging order and independent of the choice of coordinate system. The result is a robust algorithm that can be applied to several maps simultaneously. The result of a merge can also be represented with the same type of low-memory footprint format, which enables further merging and updating of the map in a hierarchical way. Furthermore, the method can perform loop closing and also detect changes in the scene between the capture of the different image sequences. Using both simulated and real data — from both a hand held mobile phone and from a drone — we verify the performance of the proposed method.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"42 1","pages":"4385-4392"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81272329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

XGBoost to Interpret the Opioid Patients' State Based on Cognitive and Physiological Measures 基于认知和生理测量的XGBoost解释阿片类药物患者的状态

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412464

O. Dehzangi, Arash Shokouhmand, P. Jeihouni, J. Ramadan, V. Finomore, N. Nasrabadi, A. Rezai

Dealing with opioid addiction and its long-term consequences is of great importance, as the addiction to opioids is emerged gradually, and established strongly in a given patient's body. Based on recent research, quitting the opioid requires clinicians to arrange a gradual plan for the patients who deal with the difficulties of overcoming addiction. This, in turn, necessitates observing the patients' wellness periodically, which is conventionally made by setting clinical appointments. With the advent of wearable sensors continuous patient monitoring becomes possible. However, the data collected through the sensors is pervasively noisy, where using sensors with different sampling frequency challenges the data processing. In this work, we handle this problem by using data from cognitive tests, along with heart rate (HR) and heart rate variability (HRV). The proposed recipe enables us to interpret the data as a feature space, where we can predict the wellness of the opioid patients by employing extreme gradient boosting (XGBoost), which results in 96.12% average accuracy of prediction as the best achieved performance.

处理阿片类药物成瘾及其长期后果是非常重要的，因为阿片类药物成瘾是逐渐出现的，并在特定患者体内牢固地建立起来。根据最近的研究，戒除阿片类药物需要临床医生为克服成瘾困难的患者安排一个渐进的计划。这反过来又需要定期观察病人的健康状况，这通常是通过设置临床预约来实现的。随着可穿戴传感器的出现，连续监测患者成为可能。然而，通过传感器收集的数据普遍存在噪声，其中使用不同采样频率的传感器对数据处理提出了挑战。在这项工作中，我们通过使用来自认知测试的数据以及心率(HR)和心率变异性(HRV)来处理这个问题。所提出的配方使我们能够将数据解释为一个特征空间，在该特征空间中，我们可以通过极端梯度增强(XGBoost)来预测阿片类药物患者的健康状况，其平均预测准确率为96.12%，达到了最佳性能。

{"title":"XGBoost to Interpret the Opioid Patients' State Based on Cognitive and Physiological Measures","authors":"O. Dehzangi, Arash Shokouhmand, P. Jeihouni, J. Ramadan, V. Finomore, N. Nasrabadi, A. Rezai","doi":"10.1109/ICPR48806.2021.9412464","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412464","url":null,"abstract":"Dealing with opioid addiction and its long-term consequences is of great importance, as the addiction to opioids is emerged gradually, and established strongly in a given patient's body. Based on recent research, quitting the opioid requires clinicians to arrange a gradual plan for the patients who deal with the difficulties of overcoming addiction. This, in turn, necessitates observing the patients' wellness periodically, which is conventionally made by setting clinical appointments. With the advent of wearable sensors continuous patient monitoring becomes possible. However, the data collected through the sensors is pervasively noisy, where using sensors with different sampling frequency challenges the data processing. In this work, we handle this problem by using data from cognitive tests, along with heart rate (HR) and heart rate variability (HRV). The proposed recipe enables us to interpret the data as a feature space, where we can predict the wellness of the opioid patients by employing extreme gradient boosting (XGBoost), which results in 96.12% average accuracy of prediction as the best achieved performance.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"51 1","pages":"6391-6395"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81287181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VSB2-Net: Visual-Semantic Bi-Branch Network for Zero-Shot Hashing VSB2-Net:零次哈希的视觉语义双分支网络

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412798

Xin Li, Xiangfeng Wang, Bo Jin, Wenjie Zhang, Jun Wang, H. Zha

Zero-shot hashing aims at learning hashing model from seen classes and the obtained model is capable of generalizing to unseen classes for image retrieval. Inspired by zero-shot learning, existing zero-shot hashing methods usually transfer the supervised knowledge from seen to unseen classes, by embedding the hamming space to a shared semantic space. However, this makes instances difficult to distinguish due to limited hashing bit numbers, especially for semantically similar unseen classes. We propose a novel inductive zero-shot hashing framework, i.e., VSB2-Net, where both semantic space and visual feature space are embedded to the same hamming space instead. The reconstructive semantic relationships are established in the hamming space, preserving local similarity relationships and explicitly enlarging the discrepancy between semantic hamming vectors. A two-task architecture, comprising of classification module and visual feature reconstruction module, is employed to enhance the generalization and transfer abilities. Extensive evaluation results on several benchmark datasets demonstrate the superiority of our proposed method compared to several state-of-the-art baselines.

零次哈希的目的是从可见类中学习哈希模型，得到的模型能够泛化到不可见类中进行图像检索。受零次学习的启发，现有的零次哈希方法通常通过将汉明空间嵌入到共享的语义空间中，将监督知识从可见类转移到不可见类。然而，由于哈希位数有限，这使得实例难以区分，特别是对于语义相似的未见过的类。我们提出了一种新的归纳零射散列框架，即VSB2-Net，其中语义空间和视觉特征空间嵌入到相同的汉明空间中。在汉明空间中建立重构语义关系，保留局部相似关系，并显式地扩大语义汉明向量之间的差异。采用由分类模块和视觉特征重构模块组成的双任务结构，增强了图像的泛化和迁移能力。在几个基准数据集上的广泛评估结果表明，与几个最先进的基线相比，我们提出的方法具有优越性。

{"title":"VSB2-Net: Visual-Semantic Bi-Branch Network for Zero-Shot Hashing","authors":"Xin Li, Xiangfeng Wang, Bo Jin, Wenjie Zhang, Jun Wang, H. Zha","doi":"10.1109/ICPR48806.2021.9412798","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412798","url":null,"abstract":"Zero-shot hashing aims at learning hashing model from seen classes and the obtained model is capable of generalizing to unseen classes for image retrieval. Inspired by zero-shot learning, existing zero-shot hashing methods usually transfer the supervised knowledge from seen to unseen classes, by embedding the hamming space to a shared semantic space. However, this makes instances difficult to distinguish due to limited hashing bit numbers, especially for semantically similar unseen classes. We propose a novel inductive zero-shot hashing framework, i.e., VSB2-Net, where both semantic space and visual feature space are embedded to the same hamming space instead. The reconstructive semantic relationships are established in the hamming space, preserving local similarity relationships and explicitly enlarging the discrepancy between semantic hamming vectors. A two-task architecture, comprising of classification module and visual feature reconstruction module, is employed to enhance the generalization and transfer abilities. Extensive evaluation results on several benchmark datasets demonstrate the superiority of our proposed method compared to several state-of-the-art baselines.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"18 1","pages":"1836-1843"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81609057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Cut and Compare: End-to-end Offline Signature Verification Network Cut and Compare:端到端离线签名验证网络

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412377

Xinyi Lu, Linlin Huang, Fei Yin

Offline signature verification, to determine whether a handwritten signature image is genuine or forged for a claimed identity, is needed in many applications. How to extract salient features and how to calculate similarity scores are the major issues. In this paper, we propose a novel end-to-end cut-and-compare network for offline signature verification. Based on the Spatial Transformer Network (STN), discriminative regions are segmented from a pair of input signature images and are compared attentively with help of Attentive Recurrent Comparator (ARC). An adaptive distance fusion module is proposed to fuse the distances of these regions. To address the intra personal variability problem, we design a smoothed double-margin loss to train the network. The proposed network achieves state-of-the-art performance on CEDAR, GPDS Synthetic, BHSig-H and BHSig-B datasets of different languages. Furthermore, our network shows strong generalization ability on cross-language test.

许多应用程序都需要离线签名验证，以确定手写签名图像是真实的还是伪造的。如何提取显著特征和如何计算相似度是主要问题。在本文中，我们提出了一种新颖的端到端切割比较网络用于离线签名验证。基于空间变换网络(STN)，从一对输入的特征图像中分割出判别区域，并借助注意循环比较器(ARC)进行仔细比较。提出了一种自适应距离融合模块来融合这些区域的距离。为了解决个人内部可变性问题，我们设计了一个平滑的双边际损失来训练网络。该网络在不同语言的CEDAR、GPDS Synthetic、BHSig-H和BHSig-B数据集上实现了最先进的性能。此外，我们的网络在跨语言测试中显示出较强的泛化能力。

引用次数: 4

Object Detection on Monocular Images with Two- Dimensional Canonical Correlation Analysis 基于二维典型相关分析的单眼图像目标检测

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412067

Zifan Yu, Suya You

Accurate and robust detection of objects from monocular images is a fundamental vision task. This paper describes a novel approach of holistic scene understanding that can simultaneously achieve multiple tasks of scene reconstruction and object detection from a single monocular image. Rather than pursuing an independent solution for each individual task as most existing work does, we seek a globally optimal solution that holistically resolves the multiple perception and reasoning tasks in an effective manner. The approach explores the complementary properties of multimodal RGB images and depth data to improve scene perception tasks. It uniquely combines the techniques of canonical correlation analysis and deep learning to learn the most correlated features to maximize the modal cross-correlation for improving performance and robustness of object detection in complex environments. Extensive experiments have been conducted to evaluate and demonstrate the performances of proposed approach.

从单眼图像中准确、稳健地检测物体是一项基本的视觉任务。本文描述了一种新的整体场景理解方法，该方法可以同时实现单目图像的场景重建和目标检测的多个任务。我们不是像大多数现有工作那样为每个单独的任务寻求独立的解决方案，而是寻求一个全局最优解决方案，以有效的方式整体解决多个感知和推理任务。该方法探索了多模态RGB图像和深度数据的互补特性，以改善场景感知任务。它独特地结合了典型相关分析和深度学习技术，学习最相关的特征，以最大限度地提高模态互相关，以提高复杂环境中目标检测的性能和鲁棒性。已经进行了大量的实验来评估和证明所提出的方法的性能。

引用次数: 0

Disentangled Representation Learning for Controllable Image Synthesis: An Information-Theoretic Perspective 基于信息论视角的可控图像合成的解纠缠表示学习

2020 25th International Conference on Pattern Recognition (ICPR)

Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9411925

Shichang Tang, Xueying Zhou, Xuming He, Yi Ma

In this paper, we look into the problem of disentangled representation learning and controllable image synthesis in a deep generative model. We develop an encoder-decoder architecture for a variant of the Variational Auto-Encoder (VAE) with two latent codes $z_{1}$ and $z_{2}$. Our framework uses $z_{2}$ to capture specified factors of variation while $z_{1}$ captures the complementary factors of variation. To this end, we analyze the learning problem from the perspective of multivariate mutual information, derive optimizable lower bounds of the conditional mutual information in the image synthesis processes and incorporate them into the training objective. We validate our method empirically on the Color MNIST dataset and the CelebA dataset by showing controllable image syntheses. Our proposed paradigm is simple yet effective and is applicable to many situations, including those where there is not an explicit factorization of features available, or where the features are non-categorical.

本文研究了深度生成模型中的解纠缠表示学习和可控图像合成问题。我们针对变分自编码器(VAE)的一种变体开发了一种编码器-解码器结构，该结构具有两个潜在码$z_{1}$和$z_{2}$。我们的框架使用$z_{2}$捕捉特定的变异因子，而$z_{1}$捕捉互补的变异因子。为此，我们从多元互信息的角度分析学习问题，推导出图像合成过程中条件互信息的可优化下界，并将其纳入训练目标。通过展示可控图像合成，我们在Color MNIST数据集和CelebA数据集上验证了我们的方法。我们提出的范例简单而有效，适用于许多情况，包括那些没有明确的可用特征分解，或者特征是非分类的情况。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 25th International Conference on Pattern Recognition (ICPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀