首页 > 最新文献

2020 25th International Conference on Pattern Recognition (ICPR)最新文献

英文 中文
IBN-STR: A Robust Text Recognizer for Irregular Text in Natural Scenes 自然场景中不规则文本的鲁棒文本识别器
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412775
Xiaoqian Li, Jie Liu, Guixuan Zhang, Shuwu Zhang
Although text recognition methods based on deep neural networks have promising performance, there are still challenges due to the variety of text styles, perspective distortion, text with large curvature, and so on. To obtain a robust text recognizer, we have improved the performance from two aspects: data aspect and feature representation aspect. In terms of data, we transform the input images into S-shape distorted images in order to increase the diversity of training data. Besides, we explore the effects of different training data. In terms of feature representation, the combination of instance normalization and batch normalization improves the model's capacity and generalization ability. This paper proposes a robust scene text recognizer IBN-STR, which is an attention-based model. Through extensive experiments, the model analysis and comparison have been carried out from the aspects of data and feature representation, and the effectiveness of IBN-STR on both regular and irregular text instances has been verified. Furthermore, IBN-STR is an end-to-end recognition system that can achieve state-of-the-art performance.
尽管基于深度神经网络的文本识别方法具有良好的性能,但由于文本样式的多样性、视角失真、文本曲率大等问题,仍然存在挑战。为了获得一个鲁棒的文本识别器,我们从数据方面和特征表示方面对性能进行了改进。在数据方面,为了增加训练数据的多样性,我们将输入图像转换为s形失真图像。此外,我们还探讨了不同训练数据的效果。在特征表示方面,实例归一化与批处理归一化相结合,提高了模型的容量和泛化能力。本文提出了一种基于注意力的鲁棒场景文本识别器IBN-STR。通过大量的实验,从数据和特征表示方面对模型进行了分析和比较,验证了IBN-STR在规则和不规则文本实例上的有效性。此外,IBN-STR是一个端到端识别系统,可以实现最先进的性能。
{"title":"IBN-STR: A Robust Text Recognizer for Irregular Text in Natural Scenes","authors":"Xiaoqian Li, Jie Liu, Guixuan Zhang, Shuwu Zhang","doi":"10.1109/ICPR48806.2021.9412775","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412775","url":null,"abstract":"Although text recognition methods based on deep neural networks have promising performance, there are still challenges due to the variety of text styles, perspective distortion, text with large curvature, and so on. To obtain a robust text recognizer, we have improved the performance from two aspects: data aspect and feature representation aspect. In terms of data, we transform the input images into S-shape distorted images in order to increase the diversity of training data. Besides, we explore the effects of different training data. In terms of feature representation, the combination of instance normalization and batch normalization improves the model's capacity and generalization ability. This paper proposes a robust scene text recognizer IBN-STR, which is an attention-based model. Through extensive experiments, the model analysis and comparison have been carried out from the aspects of data and feature representation, and the effectiveness of IBN-STR on both regular and irregular text instances has been verified. Furthermore, IBN-STR is an end-to-end recognition system that can achieve state-of-the-art performance.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"25 1","pages":"9522-9528"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85100118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Based Multi-Instance Thyroid Cytopathological Diagnosis with Multi-Scale Feature Fusion 基于多尺度特征融合的多实例甲状腺细胞病理诊断
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413184
Shuhao Qiu, Yao Guo, Chuang Zhu, Wenli Zhou, Huang Chen
In recent years, deep learning has been popular in combining with cytopathology diagnosis. Using the whole slide images (WSI) scanned by electronic scanners at clinics, researchers have developed many algorithms to classify the slide (benign or malignant). However, the key area that support the diagnosis result can be relatively small in a thyroid WSI, and only the global label can be acquired, which make the direct use of the strongly supervised learning framework infeasible. What's more, because the clinical diagnosis of the thyroid cells requires the use of visual features in different scales, a generic feature extraction way may not achieve good performance. In this paper, we propose a weakly supervised multi-instance learning framework based on attention mechanism with multi-scale feature fusion (MSF) using convolutional neural network (CNN) for thyroid cytopathological diagnosis. We take each WSI as a bag, each bag contains multiple instances which are the different regions of the WSI, our framework is trained to learn the key area automatically and make the classification. We also propose a feature fusion structure, merge the low-level features into the final feature map and add an instance-level attention module in it, which improves the classification accuracy. Our model is trained and tested on the collected clinical data, reaches the accuracy of 93.2 %, which outperforms the other existing methods. We also tested our model on a public histopathology dataset and achieves better result than the state-of-the-art deep multi-instance method.
近年来,深度学习与细胞病理学诊断相结合已成为流行趋势。利用电子扫描仪在诊所扫描的整个幻灯片图像(WSI),研究人员开发了许多算法来分类幻灯片(良性或恶性)。然而,在甲状腺WSI中,支持诊断结果的关键区域可能相对较小,并且只能获得全局标签,这使得直接使用强监督学习框架变得不可行。此外,由于甲状腺细胞的临床诊断需要使用不同尺度的视觉特征,通用的特征提取方法可能无法达到良好的效果。本文提出了一种基于多尺度特征融合(MSF)注意机制的弱监督多实例学习框架,利用卷积神经网络(CNN)进行甲状腺细胞病理诊断。我们将每个WSI作为一个包,每个包包含多个实例,这些实例是WSI的不同区域,我们的框架被训练来自动学习关键区域并进行分类。我们还提出了一种特征融合结构,将低级特征合并到最终的特征图中,并在其中添加实例级关注模块,提高了分类精度。我们的模型经过临床数据的训练和测试,准确率达到93.2%,优于现有的其他方法。我们还在公共组织病理学数据集上测试了我们的模型,并取得了比最先进的深度多实例方法更好的结果。
{"title":"Attention Based Multi-Instance Thyroid Cytopathological Diagnosis with Multi-Scale Feature Fusion","authors":"Shuhao Qiu, Yao Guo, Chuang Zhu, Wenli Zhou, Huang Chen","doi":"10.1109/ICPR48806.2021.9413184","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413184","url":null,"abstract":"In recent years, deep learning has been popular in combining with cytopathology diagnosis. Using the whole slide images (WSI) scanned by electronic scanners at clinics, researchers have developed many algorithms to classify the slide (benign or malignant). However, the key area that support the diagnosis result can be relatively small in a thyroid WSI, and only the global label can be acquired, which make the direct use of the strongly supervised learning framework infeasible. What's more, because the clinical diagnosis of the thyroid cells requires the use of visual features in different scales, a generic feature extraction way may not achieve good performance. In this paper, we propose a weakly supervised multi-instance learning framework based on attention mechanism with multi-scale feature fusion (MSF) using convolutional neural network (CNN) for thyroid cytopathological diagnosis. We take each WSI as a bag, each bag contains multiple instances which are the different regions of the WSI, our framework is trained to learn the key area automatically and make the classification. We also propose a feature fusion structure, merge the low-level features into the final feature map and add an instance-level attention module in it, which improves the classification accuracy. Our model is trained and tested on the collected clinical data, reaches the accuracy of 93.2 %, which outperforms the other existing methods. We also tested our model on a public histopathology dataset and achieves better result than the state-of-the-art deep multi-instance method.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"48 1","pages":"3536-3541"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85985026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-Attribute Regression Network for Face Reconstruction 人脸重构的多属性回归网络
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412668
Xiangzheng Li, Suping Wu
In this paper, we propose a multi-attribute regression network (MARN) to investigate the problem of face reconstruction, especially in challenging cases when faces undergo large variations including severe poses, extreme expressions, and partial occlusions in unconstrained environments. The traditional 3DMM parametric regression method does not distinguish the learning of identity, expression, and attitude attributes, resulting in lacking geometric details in the reconstructed face. We propose to learn a face multi-attribute features during 3D face reconstruction from single 2D images. Our MARN enables the network to better extract the feature information of face identity, expression, and pose attributes. We introduce three loss functions to constrain the above three face attributes respectively. At the same time, we carefully design the geometric contour constraint loss function, using the constraints of sparse 2D face landmarks to improve the reconstructed geometric contour information. The experimental results show that our MARN has achieved significant improvements in 3D face reconstruction and face alignment on the AFLW2000-3D and AFLW datasets.
在本文中,我们提出了一种多属性回归网络(MARN)来研究人脸重建问题,特别是当人脸在无约束环境中经历严重姿势、极端表情和部分遮挡等大变化时的挑战情况。传统的3DMM参数回归方法没有区分身份、表情和态度属性的学习,导致重建的人脸缺乏几何细节。我们提出在三维人脸重建过程中,从单幅二维图像中学习人脸的多属性特征。我们的MARN使网络能够更好地提取人脸身份、表情和姿态属性的特征信息。我们分别引入三个损失函数来约束上述三个人脸属性。同时,我们精心设计几何轮廓约束损失函数,利用稀疏的二维人脸地标约束来改进重构的几何轮廓信息。实验结果表明,在AFLW2000-3D和AFLW数据集上,我们的MARN在三维人脸重建和人脸对齐方面取得了显著的进步。
{"title":"Multi-Attribute Regression Network for Face Reconstruction","authors":"Xiangzheng Li, Suping Wu","doi":"10.1109/ICPR48806.2021.9412668","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412668","url":null,"abstract":"In this paper, we propose a multi-attribute regression network (MARN) to investigate the problem of face reconstruction, especially in challenging cases when faces undergo large variations including severe poses, extreme expressions, and partial occlusions in unconstrained environments. The traditional 3DMM parametric regression method does not distinguish the learning of identity, expression, and attitude attributes, resulting in lacking geometric details in the reconstructed face. We propose to learn a face multi-attribute features during 3D face reconstruction from single 2D images. Our MARN enables the network to better extract the feature information of face identity, expression, and pose attributes. We introduce three loss functions to constrain the above three face attributes respectively. At the same time, we carefully design the geometric contour constraint loss function, using the constraints of sparse 2D face landmarks to improve the reconstructed geometric contour information. The experimental results show that our MARN has achieved significant improvements in 3D face reconstruction and face alignment on the AFLW2000-3D and AFLW datasets.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"463 1","pages":"7226-7233"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76609785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
BiLuNet: A Multi-path Network for Semantic Segmentation on X-ray Images BiLuNet:用于x射线图像语义分割的多路径网络
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412027
V. Tran, Huei-Yung Lin, Hsiao-Wei Liu, Fang-Jie Jang, Chun-Han Tseng
Semantic segmentation and shape detection of lumbar vertebrae, sacrum, and femoral heads from clinical X-ray images are important and challenging tasks. In this paper, we propose a new multi-path convolutional neural network, BiLuNet, for semantic segmentation on X-ray images. The network is capable of medical image segmentation with very limited training data. With the shape fitting of the bones, we can identify the location of the target regions very accurately for lumbar vertebra inspection. We collected our dataset and annotated by doctors for model training and performance evaluation. Compared to the state-of-the-art methods, the proposed technique provides better mIoUs and higher success rates with the same training data. The experimental results have demonstrated the feasibility of our network to perform semantic segmentation for lumbar vertebrae, sacrum, and femoral heads. Code is available at: https://github.com/LuanTran07/BiLUnet-Lumbar-Spine.
从临床x线图像中对腰椎、骶骨和股骨头进行语义分割和形状检测是一项重要而具有挑战性的任务。在本文中,我们提出了一种新的多路径卷积神经网络BiLuNet,用于x射线图像的语义分割。该网络能够在训练数据非常有限的情况下进行医学图像分割。通过骨骼的形状拟合,我们可以非常准确地确定腰椎检查的目标区域的位置。我们收集数据集并由医生进行注释,用于模型训练和性能评估。与最先进的方法相比,所提出的技术在相同的训练数据下提供了更好的miou和更高的成功率。实验结果证明了我们的网络对腰椎、骶骨和股骨头进行语义分割的可行性。代码可从https://github.com/LuanTran07/BiLUnet-Lumbar-Spine获得。
{"title":"BiLuNet: A Multi-path Network for Semantic Segmentation on X-ray Images","authors":"V. Tran, Huei-Yung Lin, Hsiao-Wei Liu, Fang-Jie Jang, Chun-Han Tseng","doi":"10.1109/ICPR48806.2021.9412027","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412027","url":null,"abstract":"Semantic segmentation and shape detection of lumbar vertebrae, sacrum, and femoral heads from clinical X-ray images are important and challenging tasks. In this paper, we propose a new multi-path convolutional neural network, BiLuNet, for semantic segmentation on X-ray images. The network is capable of medical image segmentation with very limited training data. With the shape fitting of the bones, we can identify the location of the target regions very accurately for lumbar vertebra inspection. We collected our dataset and annotated by doctors for model training and performance evaluation. Compared to the state-of-the-art methods, the proposed technique provides better mIoUs and higher success rates with the same training data. The experimental results have demonstrated the feasibility of our network to perform semantic segmentation for lumbar vertebrae, sacrum, and femoral heads. Code is available at: https://github.com/LuanTran07/BiLUnet-Lumbar-Spine.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"34 1","pages":"10034-10041"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76989356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network 图卷积网络邻接矩阵的自动优化
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412046
Yuhang Zhang, Hongshuai Ren, Jiexia Ye, Xitong Gao, Yang Wang, Kejiang Ye, Chengzhong Xu
Graph Convolutional Network (GCN) is adopted to tackle the problem of convolution operation in non-Euclidean space. Previous works on GCN have made some progress, however, one of their limitations is that the design of Adjacency Matrix (AM) as GCN input requires domain knowledge and such process is cumbersome, tedious and error-prone. In addition, entries of a fixed Adjacency Matrix are generally designed as binary values (i.e., ones and zeros) which can not reflect the real relationship between nodes. Meanwhile, many applications require a weighted and dynamic Adjacency Matrix instead of an unweighted and fixed AM, and there are few works focusing on designing a more flexible Adjacency Matrix. To that end, we propose an end-to-end algorithm to improve the GCN performance by focusing on the Adjacency Matrix. We first provide a calculation method called node information entropy to update the matrix. Then, we perform the search strategy in a continuous space and introduce the Deep Deterministic Policy Gradient (DDPG) method to overcome the drawback of the discrete space search. Finally, we integrate the GCN and reinforcement learning into an end-to-end framework. Our method can automatically define the Adjacency Matrix without prior knowledge. At the same time, the proposed approach can deal with any size of the matrix and provide a better AM for network. Four popular datasets are selected to evaluate the capability of our algorithm. The method in this paper achieves the state-of-the-art performance on Cora and Pubmed datasets, with the accuracy of 84.6% and 81.6% respectively.
采用图卷积网络(GCN)来解决非欧几里德空间中的卷积运算问题。以往的GCN研究取得了一定的进展,但其局限性之一是邻接矩阵(Adjacency Matrix, AM)作为GCN输入的设计需要领域知识,且该过程繁琐、繁琐且容易出错。此外,固定邻接矩阵的表项通常被设计为二进制值(即1和0),这不能反映节点之间的真实关系。同时,许多应用需要一个加权的动态邻接矩阵,而不是一个非加权的固定的AM,很少有研究关注于设计一个更灵活的邻接矩阵。为此,我们提出了一种端到端算法,通过关注邻接矩阵来提高GCN性能。首先提出了一种节点信息熵的计算方法来更新矩阵。然后,我们在连续空间中执行搜索策略,并引入深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)方法来克服离散空间搜索的缺点。最后,我们将GCN和强化学习集成到一个端到端框架中。该方法可以在不需要先验知识的情况下自动定义邻接矩阵。同时,该方法可以处理任意大小的矩阵,为网络提供更好的调幅效果。选择了四个流行的数据集来评估我们的算法的能力。本文方法在Cora和Pubmed数据集上达到了最先进的性能,准确率分别为84.6%和81.6%。
{"title":"AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network","authors":"Yuhang Zhang, Hongshuai Ren, Jiexia Ye, Xitong Gao, Yang Wang, Kejiang Ye, Chengzhong Xu","doi":"10.1109/ICPR48806.2021.9412046","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412046","url":null,"abstract":"Graph Convolutional Network (GCN) is adopted to tackle the problem of convolution operation in non-Euclidean space. Previous works on GCN have made some progress, however, one of their limitations is that the design of Adjacency Matrix (AM) as GCN input requires domain knowledge and such process is cumbersome, tedious and error-prone. In addition, entries of a fixed Adjacency Matrix are generally designed as binary values (i.e., ones and zeros) which can not reflect the real relationship between nodes. Meanwhile, many applications require a weighted and dynamic Adjacency Matrix instead of an unweighted and fixed AM, and there are few works focusing on designing a more flexible Adjacency Matrix. To that end, we propose an end-to-end algorithm to improve the GCN performance by focusing on the Adjacency Matrix. We first provide a calculation method called node information entropy to update the matrix. Then, we perform the search strategy in a continuous space and introduce the Deep Deterministic Policy Gradient (DDPG) method to overcome the drawback of the discrete space search. Finally, we integrate the GCN and reinforcement learning into an end-to-end framework. Our method can automatically define the Adjacency Matrix without prior knowledge. At the same time, the proposed approach can deal with any size of the matrix and provide a better AM for network. Four popular datasets are selected to evaluate the capability of our algorithm. The method in this paper achieves the state-of-the-art performance on Cora and Pubmed datasets, with the accuracy of 84.6% and 81.6% respectively.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"223 1","pages":"5130-5136"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76992319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification 利用gan增强小训练数据增强图像分类性能
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412399
S. Hung, J. Q. Gan
It is difficult to achieve high performance without sufficient training data for deep convolutional neural networks (DCNNs) to learn. Data augmentation plays an important role in improving robustness and preventing overfitting in machine learning for many applications such as image classification. In this paper, a novel method for data augmentation is proposed to solve the problem of machine learning with small training datasets. The proposed method can synthesise similar images with rich diversity from only a single original training sample to increase the number of training data by using generative adversarial networks (GANs). It is expected that the synthesised images possess class-informative features, which may be in the validation or testing data but not in the training data due to that the training dataset is small, and thus they can be effective as augmented training data to improve the classification accuracy of DCNNs. The experimental results have demonstrated that the proposed method with a novel GAN framework for image training data augmentation can significantly enhance the classification performance of DCNNs for applications where original training data is limited.
如果没有足够的训练数据供深度卷积神经网络(DCNNs)学习,很难达到高性能。在图像分类等机器学习的许多应用中,数据增强在提高鲁棒性和防止过拟合方面起着重要作用。本文提出了一种新的数据增强方法来解决小训练数据集的机器学习问题。该方法利用生成式对抗网络(GANs)从单个原始训练样本中合成具有丰富多样性的相似图像,以增加训练数据的数量。期望合成的图像具有类信息特征,由于训练数据较小,这些特征可能存在于验证或测试数据中,而不存在于训练数据中,从而可以有效地作为增强训练数据来提高DCNNs的分类精度。实验结果表明,基于GAN框架的图像训练数据增强方法可以显著提高原始训练数据有限的应用中DCNNs的分类性能。
{"title":"Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification","authors":"S. Hung, J. Q. Gan","doi":"10.1109/ICPR48806.2021.9412399","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412399","url":null,"abstract":"It is difficult to achieve high performance without sufficient training data for deep convolutional neural networks (DCNNs) to learn. Data augmentation plays an important role in improving robustness and preventing overfitting in machine learning for many applications such as image classification. In this paper, a novel method for data augmentation is proposed to solve the problem of machine learning with small training datasets. The proposed method can synthesise similar images with rich diversity from only a single original training sample to increase the number of training data by using generative adversarial networks (GANs). It is expected that the synthesised images possess class-informative features, which may be in the validation or testing data but not in the training data due to that the training dataset is small, and thus they can be effective as augmented training data to improve the classification accuracy of DCNNs. The experimental results have demonstrated that the proposed method with a novel GAN framework for image training data augmentation can significantly enhance the classification performance of DCNNs for applications where original training data is limited.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"5 1","pages":"3350-3356"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80908028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
MEAN: Multi - Element Attention Network for Scene Text Recognition MEAN:用于场景文本识别的多元素注意网络
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413166
Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao, Jaesik Min
Scene text recognition is a challenging problem due to the wide variances in contents, styles, orientations, and image quality of text instances in natural scene images. To learn the intrinsic representation of scene texts, a novel multi-element attention (MEA) mechanism is proposed to exploit geometric structures from local to global levels in feature maps extracted from a scene text image. The MEA mechanism is a generalized form of self-attention technique. The elements in feature maps are taken as the nodes of an undirected graph, and three kinds of adjacency matrices are designed to aggregate information at local, neighborhood and global levels before calculating the attention weights. A multi-element attention network (MEAN) is implemented, which includes a CNN for feature extraction, an encoder with MEA mechanism and a decoder for predicting text codes. Orientational positional encoding is added to feature maps output by the CNN, and a feature vector sequence transformed from the feature maps is used as the input of the encoder. Experimental results show that MEAN has achieved state-of-the-art or competitive performance on seven public English scene text datasets (IIITSk, SVT, IC03, IC13, IC15, SVTP, and CUTE). Further experiments have been conducted on a selected subset of the RCTW Chinese scene text dataset, demonstrating that MEAN can handle horizontal, vertical, and irregular scene text samples.
由于自然场景图像中的文本实例在内容、样式、方向和图像质量方面存在很大差异,因此场景文本识别是一个具有挑战性的问题。为了学习场景文本的内在表征,提出了一种新的多元素注意(MEA)机制,利用从场景文本图像提取的特征映射从局部到全局的几何结构。MEA机制是自注意技术的一种广义形式。将特征图中的元素作为无向图的节点,设计了三种邻接矩阵,分别在局部、邻域和全局层面进行信息聚合,然后计算关注权。实现了一个多元素注意力网络(MEAN),该网络包括用于特征提取的CNN、具有MEA机制的编码器和用于预测文本代码的解码器。在CNN输出的特征图中加入方向位置编码,并将特征图变换后的特征向量序列作为编码器的输入。实验结果表明,MEAN在七个公共英语场景文本数据集(IIITSk、SVT、IC03、IC13、IC15、SVTP和CUTE)上取得了最先进或具有竞争力的性能。在RCTW中文场景文本数据集的一个子集上进行了进一步的实验,证明了MEAN可以处理水平、垂直和不规则的场景文本样本。
{"title":"MEAN: Multi - Element Attention Network for Scene Text Recognition","authors":"Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao, Jaesik Min","doi":"10.1109/ICPR48806.2021.9413166","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413166","url":null,"abstract":"Scene text recognition is a challenging problem due to the wide variances in contents, styles, orientations, and image quality of text instances in natural scene images. To learn the intrinsic representation of scene texts, a novel multi-element attention (MEA) mechanism is proposed to exploit geometric structures from local to global levels in feature maps extracted from a scene text image. The MEA mechanism is a generalized form of self-attention technique. The elements in feature maps are taken as the nodes of an undirected graph, and three kinds of adjacency matrices are designed to aggregate information at local, neighborhood and global levels before calculating the attention weights. A multi-element attention network (MEAN) is implemented, which includes a CNN for feature extraction, an encoder with MEA mechanism and a decoder for predicting text codes. Orientational positional encoding is added to feature maps output by the CNN, and a feature vector sequence transformed from the feature maps is used as the input of the encoder. Experimental results show that MEAN has achieved state-of-the-art or competitive performance on seven public English scene text datasets (IIITSk, SVT, IC03, IC13, IC15, SVTP, and CUTE). Further experiments have been conducted on a selected subset of the RCTW Chinese scene text dataset, demonstrating that MEAN can handle horizontal, vertical, and irregular scene text samples.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"24 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80932868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Writer Identification Using Deep Neural Networks: Impact of Patch Size and Number of Patches 使用深度神经网络的作者识别:补丁大小和补丁数量的影响
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412575
Akshay Punjabi, J. R. Prieto, E. Vidal
Traditional approaches for the recognition or identification of the writer of a handwritten text image used to relay on heuristic knowledge about the shape and other features of the strokes of previously segmented characters. However, recent works have done significantly advances on the state of the art thanks to the use of various types of deep neural networks. In most of all of these works, text images are decomposed into patches, which are processed by the networks without any previous character or word segmentation. In this paper, we study how the way images are decomposed into patches impact recognition accuracy, using three publicly available datasets. The study also includes a simpler architecture where no patches are used at all – a single deep neural network inputs a whole text image and directly provides a writer recognition hypothesis. Results show that bigger patches generally lead to improved accuracy, achieving in one of the datasets a significant improvement over the best results reported so far.
传统的识别或识别手写文本图像作者的方法依赖于对先前分割的字符的形状和笔画的其他特征的启发式知识。然而,由于使用了各种类型的深度神经网络,最近的工作在技术水平上取得了重大进展。在大多数这些工作中,文本图像被分解成小块,这些小块由网络处理,而不需要任何先前的字符或词分割。在本文中,我们使用三个公开可用的数据集,研究图像分解成小块的方式如何影响识别精度。该研究还包括一个更简单的架构,完全没有使用补丁——一个深度神经网络输入整个文本图像,并直接提供一个作家识别假设。结果表明,更大的补丁通常会导致准确性的提高,在其中一个数据集中实现了迄今为止报道的最佳结果的显着改进。
{"title":"Writer Identification Using Deep Neural Networks: Impact of Patch Size and Number of Patches","authors":"Akshay Punjabi, J. R. Prieto, E. Vidal","doi":"10.1109/ICPR48806.2021.9412575","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412575","url":null,"abstract":"Traditional approaches for the recognition or identification of the writer of a handwritten text image used to relay on heuristic knowledge about the shape and other features of the strokes of previously segmented characters. However, recent works have done significantly advances on the state of the art thanks to the use of various types of deep neural networks. In most of all of these works, text images are decomposed into patches, which are processed by the networks without any previous character or word segmentation. In this paper, we study how the way images are decomposed into patches impact recognition accuracy, using three publicly available datasets. The study also includes a simpler architecture where no patches are used at all – a single deep neural network inputs a whole text image and directly provides a writer recognition hypothesis. Results show that bigger patches generally lead to improved accuracy, achieving in one of the datasets a significant improvement over the best results reported so far.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"4 1","pages":"9764-9771"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80964816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Deep Learning-Based Method for Predicting Volumes of Nasopharyngeal Carcinoma for Adaptive Radiation Therapy Treatment 基于深度学习的鼻咽癌适应性放疗体积预测方法
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412924
Bilel Daoud, K. Morooka, Shoko Miyauchi, R. Kurazume, W. Mnejja, L. Farhat, J. Daoud
This paper presents a new system for predicting the spatial change of Nasopharyngeal carcinoma(NPC) and organ-at-risks (OARs) volumes over the course of the radiation therapy (RT) treatment for facilitating the workflow of adaptive radiotherapy. The proposed system, called “Tumor Evolution Prediction (TEP-Net)”, predicts the spatial distributions of NPC and 5 OARs, separately, in response to RT in the coming week, week n. Here, TEP-Net has (n-1)-inputs that are week 1 to week n-1 of CT axial, coronal or sagittal images acquired once the patient complete the planned RT treatment of the corresponding week. As a result, three predicted results of each target region are obtained from the three-view CT images. To determine the final prediction of NPC and 5 OARs, two integration methods, weighted fully connected layers and weighted voting methods, are introduced. From the experiments using weekly CT images of 140 NPC patients, our proposed system achieves the best performance for predicting NPC and OARs compared with conventional methods.
本文提出了一种预测鼻咽癌(NPC)和危险器官(OARs)体积在放射治疗(RT)过程中的空间变化的新系统,以促进适应性放疗的工作流程。所提出的系统称为“肿瘤演变预测(TEP-Net)”,分别预测NPC和5个OARs在未来一周(第n周)对RT的响应的空间分布。在这里,TEP-Net有(n-1)个输入,即患者完成相应周的计划RT治疗后获得的第1周到第n-1周的CT轴向、冠状或矢状图像。结果表明,从三视图CT图像中得到了每个目标区域的三个预测结果。为了确定NPC和5个桨的最终预测结果,引入了加权全连通层和加权投票两种积分方法。通过对140例鼻咽癌患者的每周CT图像进行实验,与传统方法相比,我们提出的系统在预测鼻咽癌和桨叶方面取得了最好的效果。
{"title":"A Deep Learning-Based Method for Predicting Volumes of Nasopharyngeal Carcinoma for Adaptive Radiation Therapy Treatment","authors":"Bilel Daoud, K. Morooka, Shoko Miyauchi, R. Kurazume, W. Mnejja, L. Farhat, J. Daoud","doi":"10.1109/ICPR48806.2021.9412924","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412924","url":null,"abstract":"This paper presents a new system for predicting the spatial change of Nasopharyngeal carcinoma(NPC) and organ-at-risks (OARs) volumes over the course of the radiation therapy (RT) treatment for facilitating the workflow of adaptive radiotherapy. The proposed system, called “Tumor Evolution Prediction (TEP-Net)”, predicts the spatial distributions of NPC and 5 OARs, separately, in response to RT in the coming week, week n. Here, TEP-Net has (n-1)-inputs that are week 1 to week n-1 of CT axial, coronal or sagittal images acquired once the patient complete the planned RT treatment of the corresponding week. As a result, three predicted results of each target region are obtained from the three-view CT images. To determine the final prediction of NPC and 5 OARs, two integration methods, weighted fully connected layers and weighted voting methods, are introduced. From the experiments using weekly CT images of 140 NPC patients, our proposed system achieves the best performance for predicting NPC and OARs compared with conventional methods.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"9 1","pages":"3256-3263"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80984587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A delayed Elastic-Net approach for performing adversarial attacks 执行对抗性攻击的延迟弹性网络方法
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413170
Brais Cancela, V. Bolón-Canedo, Amparo Alonso-Betanzos
With the rise of the so-called Adversarial Attacks, there is an increased concern on model security. In this paper we present two different contributions: novel measures of robustness (based on adversarial attacks) and a novel adversarial attack. The key idea behind these metrics is to obtain a measure that could compare different architectures, with independence of how the input is preprocessed (robustness against different input sizes and value ranges). To do so, a novel adversarial attack is presented, performing a delayed elastic-net adversarial attack (constraints are only used whenever a successful adversarial attack is obtained). Experimental results show that our approach obtains state-of-the-art adversarial samples, in terms of minimal perturbation distance. Finally, a benchmark of ImageNet pretrained models is used to conduct experiments aiming to shed some light about which model should be selected whenever security is a role factor.
随着所谓的对抗性攻击的兴起,人们越来越关注模型的安全性。在本文中,我们提出了两个不同的贡献:新的鲁棒性度量(基于对抗性攻击)和一种新的对抗性攻击。这些指标背后的关键思想是获得一个可以比较不同体系结构的度量,并且独立于输入的预处理方式(对不同输入大小和值范围的鲁棒性)。为此,提出了一种新的对抗攻击,执行延迟弹性网对抗攻击(只有在获得成功的对抗攻击时才使用约束)。实验结果表明,就最小扰动距离而言,我们的方法获得了最先进的对抗样本。最后,使用ImageNet预训练模型的基准进行实验,旨在揭示当安全是一个角色因素时应该选择哪个模型。
{"title":"A delayed Elastic-Net approach for performing adversarial attacks","authors":"Brais Cancela, V. Bolón-Canedo, Amparo Alonso-Betanzos","doi":"10.1109/ICPR48806.2021.9413170","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413170","url":null,"abstract":"With the rise of the so-called Adversarial Attacks, there is an increased concern on model security. In this paper we present two different contributions: novel measures of robustness (based on adversarial attacks) and a novel adversarial attack. The key idea behind these metrics is to obtain a measure that could compare different architectures, with independence of how the input is preprocessed (robustness against different input sizes and value ranges). To do so, a novel adversarial attack is presented, performing a delayed elastic-net adversarial attack (constraints are only used whenever a successful adversarial attack is obtained). Experimental results show that our approach obtains state-of-the-art adversarial samples, in terms of minimal perturbation distance. Finally, a benchmark of ImageNet pretrained models is used to conduct experiments aiming to shed some light about which model should be selected whenever security is a role factor.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"186 3 1","pages":"378-384"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81074114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 25th International Conference on Pattern Recognition (ICPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1