首页 > 最新文献

2020 25th International Conference on Pattern Recognition (ICPR)最新文献

英文 中文
Probabilistic Word Embeddings in Kinematic Space 运动空间中的概率词嵌入
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412050
Adarsh Jamadandi, Rishabh Tigadoli, R. Tabib, U. Mudenagudi
In this paper, we propose a method for learning representations in the space of Gaussian-like distribution defined on a novel geometrical space called Kinematic space. The utility of non-Euclidean geometry for deep representation learning has recently been in vogue, specifically models of hyperbolic geometry such as Poincaré and Lorentz models have proven useful for learning hierarchical representations. Going beyond manifolds with constant curvature, albeit has better representation capacity might lead to unhanding of computationally tractable tools like Riemannian optimization methods. Here, we explore a pseudo-Riemannian auxiliary Lorentzian space called Kinematic space and provide a principled approach for constructing a Gaussian-like distribution, which is compatible with gradient-based learning methods, to formulate a probabilistic word embedding framework. Contrary to, mapping lexically distributed representations to a single point vector in Euclidean space, we advocate for mapping entities to density-based representations, as it provides explicit control over the uncertainty in representations. We test our framework by embedding WordNet-Noun hierarchy, a large lexical database, our experiments report strong consistent improvements in Mean Rank and Mean Average Precision (MAP) values compared to probabilistic word embedding frameworks defined on Euclidean and hyperbolic spaces. We show an average improvement of 72.68% in MAP and 82.60% in Rank compared to the hyperbolic version. Our work serves as evidence for the utility of novel geometrical spaces for learning hierarchical representations.
在本文中,我们提出了一种学习类高斯分布空间中表示的方法,该空间定义在一种新的几何空间——运动学空间上。非欧几里得几何在深度表征学习中的应用最近很流行,特别是双曲几何模型,如庞加莱模型和洛伦兹模型,已被证明对学习分层表征很有用。尽管具有更好的表示能力,但超越具有恒定曲率的流形可能导致无法处理计算上易于处理的工具,如黎曼优化方法。在这里,我们探索了伪黎曼辅助洛伦兹空间(称为运动学空间),并提供了一种构造类高斯分布的原则方法,该方法与基于梯度的学习方法兼容,以形成概率词嵌入框架。与将词法分布表示映射到欧几里得空间中的单点向量相反,我们主张将实体映射到基于密度的表示,因为它提供了对表示中的不确定性的显式控制。我们通过嵌入wordnet -名词层次结构(一个大型词汇数据库)来测试我们的框架,我们的实验报告了与定义在欧几里得和双曲空间上的概率词嵌入框架相比,平均秩和平均精度(MAP)值有很强的一致性改进。我们显示,与双曲版本相比,MAP平均提高了72.68%,Rank平均提高了82.60%。我们的工作为新的几何空间在学习分层表示中的效用提供了证据。
{"title":"Probabilistic Word Embeddings in Kinematic Space","authors":"Adarsh Jamadandi, Rishabh Tigadoli, R. Tabib, U. Mudenagudi","doi":"10.1109/ICPR48806.2021.9412050","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412050","url":null,"abstract":"In this paper, we propose a method for learning representations in the space of Gaussian-like distribution defined on a novel geometrical space called Kinematic space. The utility of non-Euclidean geometry for deep representation learning has recently been in vogue, specifically models of hyperbolic geometry such as Poincaré and Lorentz models have proven useful for learning hierarchical representations. Going beyond manifolds with constant curvature, albeit has better representation capacity might lead to unhanding of computationally tractable tools like Riemannian optimization methods. Here, we explore a pseudo-Riemannian auxiliary Lorentzian space called Kinematic space and provide a principled approach for constructing a Gaussian-like distribution, which is compatible with gradient-based learning methods, to formulate a probabilistic word embedding framework. Contrary to, mapping lexically distributed representations to a single point vector in Euclidean space, we advocate for mapping entities to density-based representations, as it provides explicit control over the uncertainty in representations. We test our framework by embedding WordNet-Noun hierarchy, a large lexical database, our experiments report strong consistent improvements in Mean Rank and Mean Average Precision (MAP) values compared to probabilistic word embedding frameworks defined on Euclidean and hyperbolic spaces. We show an average improvement of 72.68% in MAP and 82.60% in Rank compared to the hyperbolic version. Our work serves as evidence for the utility of novel geometrical spaces for learning hierarchical representations.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"8 1","pages":"8759-8765"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72926520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAT Optimized CNN Model Identifies Water Stress in Chickpea Plant Shoot Images BAT优化CNN模型识别鹰嘴豆植物芽图像中的水分胁迫
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412720
S. Azimi, T. Kaur, T. Gandhi
Stress due to water deficiency in plants can significantly lower the agricultural yield. It can affect many visible plant traits such as size and surface area, the number of leaves and their color, etc. In recent years, computer vision-based plant phenomics has emerged as a promising tool for plant research and management. Such techniques have the advantage of being non-destructive, non-evasive, fast, and offer high levels of automation. Pulses like chickpeas play an important role in ensuring food security in poor countries owing to their high protein and nutrition content. In the present work, we have built a dataset comprising of two varieties of chickpea plant shoot images under different moisture stress conditions. Specifically, we propose a BAT optimized ResNet-18 model for classifying stress induced by water deficiency using chickpea shoot images. BAT algorithm identifies the optimal value of the mini-batch size to be used for training rather than employing the traditional manual approach of trial and error. Experimentation on two crop varieties (JG and Pusa) reveals that BAT optimized approach achieves an accuracy of 96% and 91% for JG and Pusa varieties that is better than the traditional method by 4%. The experimental results are also compared with state of the art CNN models like Alexnet, GoogleNet, and ResNet-50. The comparison results demonstrate that the proposed BAT optimized ResNet-18 model achieves higher performance than the comparison counterparts.
植物缺水胁迫可显著降低农业产量。它可以影响许多可见的植物性状,如大小和表面积,叶子的数量和颜色等。近年来,基于计算机视觉的植物表型组学已成为植物研究和管理的一个有前途的工具。这种技术具有非破坏性、非规避性、快速和提供高水平自动化的优点。鹰嘴豆等豆类因其高蛋白和营养含量,在确保贫穷国家粮食安全方面发挥着重要作用。在本工作中,我们建立了一个由两个品种的鹰嘴豆植物在不同水分胁迫条件下的芽图像组成的数据集。具体而言,我们提出了一个BAT优化的ResNet-18模型,用于鹰嘴豆芽图像的缺水胁迫分类。BAT算法识别用于训练的小批大小的最优值,而不是采用传统的人工试错方法。以JG和Pusa两个作物品种为试验对象,BAT优化方法对JG和Pusa品种的识别准确率分别为96%和91%,比传统方法提高了4%。实验结果还与最先进的CNN模型(如Alexnet, GoogleNet和ResNet-50)进行了比较。对比结果表明,本文提出的BAT优化后的ResNet-18模型比对比模型具有更高的性能。
{"title":"BAT Optimized CNN Model Identifies Water Stress in Chickpea Plant Shoot Images","authors":"S. Azimi, T. Kaur, T. Gandhi","doi":"10.1109/ICPR48806.2021.9412720","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412720","url":null,"abstract":"Stress due to water deficiency in plants can significantly lower the agricultural yield. It can affect many visible plant traits such as size and surface area, the number of leaves and their color, etc. In recent years, computer vision-based plant phenomics has emerged as a promising tool for plant research and management. Such techniques have the advantage of being non-destructive, non-evasive, fast, and offer high levels of automation. Pulses like chickpeas play an important role in ensuring food security in poor countries owing to their high protein and nutrition content. In the present work, we have built a dataset comprising of two varieties of chickpea plant shoot images under different moisture stress conditions. Specifically, we propose a BAT optimized ResNet-18 model for classifying stress induced by water deficiency using chickpea shoot images. BAT algorithm identifies the optimal value of the mini-batch size to be used for training rather than employing the traditional manual approach of trial and error. Experimentation on two crop varieties (JG and Pusa) reveals that BAT optimized approach achieves an accuracy of 96% and 91% for JG and Pusa varieties that is better than the traditional method by 4%. The experimental results are also compared with state of the art CNN models like Alexnet, GoogleNet, and ResNet-50. The comparison results demonstrate that the proposed BAT optimized ResNet-18 model achieves higher performance than the comparison counterparts.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"48 1","pages":"8500-8506"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75375285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Semantic Segmentation Refinement Using Entropy and Boundary-guided Monte Carlo Sampling and Directed Regional Search 基于熵和边界引导蒙特卡罗采样和定向区域搜索的语义分割改进
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413099
Zitang Sun, S. Kamata, Ruojing Wang
Semantic segmentation requires both a large receptive field and accurate spatial information. Although existing methods based on a fully convolutional network have greatly improved the accuracy, the prediction results still do not show satisfactory when parsing small objects and boundary regions. We propose a refinement algorithm to improve the result generated by the front network. Our method takes a modified double-branches network to generate both segmentation masks and semantic boundaries, which serve as refinement algorithms' input. We creatively introduce information entropy to represent the confidence of the neural network's prediction corresponding to each pixel. The information entropy combined with the semantic boundary can capture those unpredictable pixels with low-confidence through Monte Carlo sampling. Each selected pixel will serve as the initial seed for directed local search and refinement. According to the initial seed, our purpose is tantamount to searching the neighbor high-confidence regions, and the re-labeling approach is based on high-confidence results. Remarkably, our method adopts a directed regional search strategy based on gradient descent to find the high-confidence region effectively. Our method can be flexibly embedded into the existing encoder backbone at a trivial computational cost. Our refinement algorithm can further improve the state of the art method's accuracy both on Cityscapes and PASCAL VOC datasets. In evaluating some small objects, our method surpasses most of the state of the art methods.
语义分割既需要较大的接受野,又需要准确的空间信息。虽然现有的基于全卷积网络的方法已经大大提高了预测精度,但在分析小目标和边界区域时,预测结果仍然不令人满意。我们提出了一种改进算法来改进前网络生成的结果。该方法采用改进的双分支网络生成分割掩码和语义边界,作为细化算法的输入。我们创造性地引入信息熵来表示每个像素对应的神经网络预测的置信度。将信息熵与语义边界相结合,通过蒙特卡罗采样,可以捕获那些难以预测的低置信度像素。每个选定的像素都将作为定向局部搜索和细化的初始种子。根据初始种子,我们的目的相当于搜索相邻的高置信度区域,重新标记方法是基于高置信度结果。值得注意的是,我们的方法采用了基于梯度下降的定向区域搜索策略,可以有效地找到高置信度区域。我们的方法可以灵活地嵌入到现有的编码器骨干中,计算成本很低。我们的改进算法可以进一步提高当前方法在城市景观和PASCAL VOC数据集上的准确性。在评估一些小对象时,我们的方法超越了大多数最先进的方法。
{"title":"Semantic Segmentation Refinement Using Entropy and Boundary-guided Monte Carlo Sampling and Directed Regional Search","authors":"Zitang Sun, S. Kamata, Ruojing Wang","doi":"10.1109/ICPR48806.2021.9413099","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413099","url":null,"abstract":"Semantic segmentation requires both a large receptive field and accurate spatial information. Although existing methods based on a fully convolutional network have greatly improved the accuracy, the prediction results still do not show satisfactory when parsing small objects and boundary regions. We propose a refinement algorithm to improve the result generated by the front network. Our method takes a modified double-branches network to generate both segmentation masks and semantic boundaries, which serve as refinement algorithms' input. We creatively introduce information entropy to represent the confidence of the neural network's prediction corresponding to each pixel. The information entropy combined with the semantic boundary can capture those unpredictable pixels with low-confidence through Monte Carlo sampling. Each selected pixel will serve as the initial seed for directed local search and refinement. According to the initial seed, our purpose is tantamount to searching the neighbor high-confidence regions, and the re-labeling approach is based on high-confidence results. Remarkably, our method adopts a directed regional search strategy based on gradient descent to find the high-confidence region effectively. Our method can be flexibly embedded into the existing encoder backbone at a trivial computational cost. Our refinement algorithm can further improve the state of the art method's accuracy both on Cityscapes and PASCAL VOC datasets. In evaluating some small objects, our method surpasses most of the state of the art methods.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"23 1","pages":"3931-3938"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72580366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual Saliency Oriented Vehicle Scale Estimation 基于视觉显著性的车辆量表估计
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412618
Jiali Ding, Tie Liu, Qixin Chen, Zejian Yuan, Yuanyuan Shang
Vehicle scale estimation with a single camera is a typical application for intelligent transportation and it faces the challenges from visual computing while intensity-based method and descriptor-based method should be balanced. This paper proposed a vehicle scale estimation method based on salient object detection to resolve this problem. The regularized intensity matching method is proposed in Lie Algebra to achieve robust and accurate scale estimation, and descriptor matching and intensity matching are combined to minimize the proposed loss function. The visual attention mechanism is designed to select image patches with texture and remove the occluded image patches. Then the weights are assigned to pixels from the selected image patches which alleviates the influence of noise-corrupted pixels. The experiments show that the proposed method significantly outperforms state-of-the-art methods with regard to the robustness and accuracy of vehicle scale estimation.
单摄像头车辆尺度估计是智能交通的典型应用,它面临着视觉计算的挑战,需要权衡基于强度的方法和基于描述符的方法。针对这一问题,本文提出了一种基于显著目标检测的车辆尺度估计方法。在李代数中提出正则化强度匹配方法以实现鲁棒和精确的尺度估计,并将描述子匹配和强度匹配相结合以最小化所提出的损失函数。设计视觉注意机制,选择具有纹理的图像块,去除被遮挡的图像块。然后对所选图像块中的像素分配权重,减轻了噪声损坏像素的影响。实验表明,该方法在车辆尺度估计的鲁棒性和准确性方面明显优于现有方法。
{"title":"Visual Saliency Oriented Vehicle Scale Estimation","authors":"Jiali Ding, Tie Liu, Qixin Chen, Zejian Yuan, Yuanyuan Shang","doi":"10.1109/ICPR48806.2021.9412618","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412618","url":null,"abstract":"Vehicle scale estimation with a single camera is a typical application for intelligent transportation and it faces the challenges from visual computing while intensity-based method and descriptor-based method should be balanced. This paper proposed a vehicle scale estimation method based on salient object detection to resolve this problem. The regularized intensity matching method is proposed in Lie Algebra to achieve robust and accurate scale estimation, and descriptor matching and intensity matching are combined to minimize the proposed loss function. The visual attention mechanism is designed to select image patches with texture and remove the occluded image patches. Then the weights are assigned to pixels from the selected image patches which alleviates the influence of noise-corrupted pixels. The experiments show that the proposed method significantly outperforms state-of-the-art methods with regard to the robustness and accuracy of vehicle scale estimation.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"43 1","pages":"1867-1873"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78624629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RefiNet: 3D Human Pose Refinement with Depth Maps RefiNet: 3D人体姿态细化与深度图
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412451
Andrea D'Eusanio, S. Pini, G. Borghi, R. Vezzani, R. Cucchiara
Human Pose Estimation is a fundamental task for many applications in the Computer Vision community and it has been widely investigated in the 2D domain, i.e. intensity images. Therefore, most of the available methods for this task are mainly based on 2D Convolutional Neural Networks and huge manually-annotated RGB datasets, achieving stunning results. In this paper, we propose RefiNet, a multi-stage framework that regresses an extremely-precise 3D human pose estimation from a given 2D pose and a depth map. The framework consists of three different modules, each one specialized in a particular refinement and data representation, i.e. depth patches, 3D skeleton and point clouds. Moreover, we present a new dataset, called Baracca, acquired with RGB, depth and thermal cameras and specifically created for the automotive context. Experimental results confirm the quality of the refinement procedure that largely improves the human pose estimations of off-the-shelf 2D methods.
人体姿态估计是计算机视觉领域许多应用的基础任务,在二维领域(即强度图像)中得到了广泛的研究。因此,大多数可用的方法主要是基于2D卷积神经网络和大量手动标注的RGB数据集,取得了惊人的结果。在本文中,我们提出了RefiNet,这是一个多阶段框架,可以从给定的2D姿态和深度图中回归极其精确的3D人体姿态估计。该框架由三个不同的模块组成,每个模块专门用于特定的细化和数据表示,即深度补丁,3D骨架和点云。此外,我们还提供了一个名为Baracca的新数据集,该数据集由RGB、深度和热像仪获得,专门为汽车环境创建。实验结果证实了改进过程的质量,极大地提高了现成的二维方法的人体姿态估计。
{"title":"RefiNet: 3D Human Pose Refinement with Depth Maps","authors":"Andrea D'Eusanio, S. Pini, G. Borghi, R. Vezzani, R. Cucchiara","doi":"10.1109/ICPR48806.2021.9412451","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412451","url":null,"abstract":"Human Pose Estimation is a fundamental task for many applications in the Computer Vision community and it has been widely investigated in the 2D domain, i.e. intensity images. Therefore, most of the available methods for this task are mainly based on 2D Convolutional Neural Networks and huge manually-annotated RGB datasets, achieving stunning results. In this paper, we propose RefiNet, a multi-stage framework that regresses an extremely-precise 3D human pose estimation from a given 2D pose and a depth map. The framework consists of three different modules, each one specialized in a particular refinement and data representation, i.e. depth patches, 3D skeleton and point clouds. Moreover, we present a new dataset, called Baracca, acquired with RGB, depth and thermal cameras and specifically created for the automotive context. Experimental results confirm the quality of the refinement procedure that largely improves the human pose estimations of off-the-shelf 2D methods.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"12 1","pages":"2320-2327"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78627749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Webly Supervised Image-Text Embedding with Noisy Tag Refinement 基于噪声标签细化的网络监督图像文本嵌入
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412106
Niluthpol Chowdhury Mithun, Ravdeep Pasricha, E. Papalexakis, A. Roy-Chowdhury
In this paper, we address the problem of utilizing web images in training robust joint embedding models for the image-text retrieval task. Prior webly supervised approaches directly leverage weakly annotated web images in the joint embedding learning framework. The objective of these approaches would suffer significantly when the ratio of noisy and missing tags associated with the web images is very high. In this regard, we propose a CP decomposition based tensor completion framework to refine the tags of web images by modeling observed ternary inter-relations between the sets of labeled images, tags, and web images as a tensor. To effectively deal with the high ratio of missing entries likely in our case, we incorporate intra-modal correlation as side information in the proposed framework. Our tag refinement approach combined with existing web supervised image-text embedding approaches provide a more principled way for learning the joint embedding models in the presence of significant noise from web data and limited clean labeled data. Experiments on benchmark datasets demonstrate that the proposed approach helps to achieve a significant performance gain in image-text retrieval.
在本文中,我们解决了利用web图像训练鲁棒联合嵌入模型用于图像-文本检索任务的问题。先前的网络监督方法直接利用联合嵌入学习框架中的弱注释网络图像。当与web图像相关的噪声和缺失标签的比例非常高时,这些方法的目标将受到严重影响。在这方面,我们提出了一个基于CP分解的张量补全框架,通过将观察到的标记图像、标签和web图像之间的三元相互关系建模为一个张量,来改进web图像的标签。为了有效地处理在我们的案例中可能出现的高缺失条目比例,我们将模态内相关性作为提议框架中的侧信息。我们的标签细化方法与现有的web监督图像-文本嵌入方法相结合,提供了一种更有原则的方法,用于在web数据存在明显噪声和有限的干净标记数据的情况下学习联合嵌入模型。在基准数据集上的实验表明,该方法有助于在图像文本检索中获得显着的性能提升。
{"title":"Webly Supervised Image-Text Embedding with Noisy Tag Refinement","authors":"Niluthpol Chowdhury Mithun, Ravdeep Pasricha, E. Papalexakis, A. Roy-Chowdhury","doi":"10.1109/ICPR48806.2021.9412106","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412106","url":null,"abstract":"In this paper, we address the problem of utilizing web images in training robust joint embedding models for the image-text retrieval task. Prior webly supervised approaches directly leverage weakly annotated web images in the joint embedding learning framework. The objective of these approaches would suffer significantly when the ratio of noisy and missing tags associated with the web images is very high. In this regard, we propose a CP decomposition based tensor completion framework to refine the tags of web images by modeling observed ternary inter-relations between the sets of labeled images, tags, and web images as a tensor. To effectively deal with the high ratio of missing entries likely in our case, we incorporate intra-modal correlation as side information in the proposed framework. Our tag refinement approach combined with existing web supervised image-text embedding approaches provide a more principled way for learning the joint embedding models in the presence of significant noise from web data and limited clean labeled data. Experiments on benchmark datasets demonstrate that the proposed approach helps to achieve a significant performance gain in image-text retrieval.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"27 1","pages":"7454-7461"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78409049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ordinal Depth Classification Using Region-based Self-attention 基于区域自关注的有序深度分类
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412477
Minh-Hieu Phan, S. L. Phung, A. Bouzerdoum
Depth perception is essential for scene understanding, autonomous navigation and augmented reality. Depth estimation from a single 2D image is challenging due to the lack of reliable cues, e.g. stereo correspondences and motions. Modern approaches exploit multi-scale feature extraction to provide more powerful representations for deep networks. However, these studies only use simple addition or concatenation to combine the extracted multi-scale features. This paper proposes a novel region-based self-attention (rSA) unit for effective feature fusions. The rSA recalibrates the multi-scale responses by explicitly modelling the dependency between channels in separate image regions. We discretize continuous depths to formulate an ordinal depth classification problem in which the relative order between categories is preserved. The experiments are performed on a dataset of 4410 RGB-D images, captured in outdoor environments at the University of Wollongong's campus. The proposed module improves the models on small-sized datasets by 22% to 40%.
深度感知对于场景理解、自主导航和增强现实至关重要。由于缺乏可靠的线索,例如立体对应和运动,从单个2D图像进行深度估计是具有挑战性的。现代方法利用多尺度特征提取为深度网络提供更强大的表示。然而,这些研究仅使用简单的加法或串联来组合提取的多尺度特征。提出了一种新的基于区域的自关注(rSA)单元,用于有效的特征融合。rSA通过明确地模拟不同图像区域通道之间的依赖性来重新校准多尺度响应。我们将连续深度离散化,得到一个保持类别间相对顺序的有序深度分类问题。实验是在一个4410张RGB-D图像的数据集上进行的,这些图像是在伍伦贡大学校园的户外环境中拍摄的。该模型在小型数据集上的性能提高了22% ~ 40%。
{"title":"Ordinal Depth Classification Using Region-based Self-attention","authors":"Minh-Hieu Phan, S. L. Phung, A. Bouzerdoum","doi":"10.1109/ICPR48806.2021.9412477","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412477","url":null,"abstract":"Depth perception is essential for scene understanding, autonomous navigation and augmented reality. Depth estimation from a single 2D image is challenging due to the lack of reliable cues, e.g. stereo correspondences and motions. Modern approaches exploit multi-scale feature extraction to provide more powerful representations for deep networks. However, these studies only use simple addition or concatenation to combine the extracted multi-scale features. This paper proposes a novel region-based self-attention (rSA) unit for effective feature fusions. The rSA recalibrates the multi-scale responses by explicitly modelling the dependency between channels in separate image regions. We discretize continuous depths to formulate an ordinal depth classification problem in which the relative order between categories is preserved. The experiments are performed on a dataset of 4410 RGB-D images, captured in outdoor environments at the University of Wollongong's campus. The proposed module improves the models on small-sized datasets by 22% to 40%.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"84 1","pages":"3620-3627"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75909537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening 响应性社交微笑:一个基于机器学习的多模态行为评估框架,用于早期自闭症筛查
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412766
Yueran Pan, Kunjing Cai, Ming Cheng, Xiaobing Zou, Ming Li
Autism spectrum disorder (ASD) is a neuro-developmental disorder, which causes deficits in social lives. Early screening of ASD for young children is important to reduce the impact of ASD on people's lives. Traditional screening methods mainly rely on protocol-based interviews and subjective evaluations from clinicians and domain experts, which requires advanced expertise and intensive labor. To standardize the process of ASD screening, we design a “Responsive Social Smile” protocol and the associated experimental setup. Moreover, we propose a machine learning based assessment framework for early ASD screening. By integrating speech recognition and computer vision technologies, the proposed framework can quantitatively analyze children's behaviors under well-designed protocols. We collect 196 stimulus samples from 41 children with an average age of 23.34 months, and the proposed method obtains 85.20% accuracy for predicting stimulus scores and 80.49% accuracy for the final ASD prediction. This result indicates that our model approaches the average level of domain experts in this “Responsive Social Smile” protocol.
自闭症谱系障碍(ASD)是一种神经发育障碍,它会导致社交生活的缺陷。对幼儿进行ASD早期筛查对于减少ASD对人们生活的影响非常重要。传统的筛查方法主要依赖于基于协议的访谈和临床医生和领域专家的主观评估,这需要先进的专业知识和密集的劳动。为了规范自闭症谱系障碍的筛查过程,我们设计了一个“反应性社会微笑”方案和相关的实验设置。此外,我们提出了一个基于机器学习的早期ASD筛查评估框架。通过整合语音识别和计算机视觉技术,该框架可以在精心设计的协议下定量分析儿童的行为。我们从41名平均年龄为23.34个月的儿童中收集了196个刺激样本,所提出的方法预测刺激评分的准确率为85.20%,预测最终ASD的准确率为80.49%。这个结果表明,我们的模型接近这个“响应式社会微笑”协议的领域专家的平均水平。
{"title":"Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening","authors":"Yueran Pan, Kunjing Cai, Ming Cheng, Xiaobing Zou, Ming Li","doi":"10.1109/ICPR48806.2021.9412766","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412766","url":null,"abstract":"Autism spectrum disorder (ASD) is a neuro-developmental disorder, which causes deficits in social lives. Early screening of ASD for young children is important to reduce the impact of ASD on people's lives. Traditional screening methods mainly rely on protocol-based interviews and subjective evaluations from clinicians and domain experts, which requires advanced expertise and intensive labor. To standardize the process of ASD screening, we design a “Responsive Social Smile” protocol and the associated experimental setup. Moreover, we propose a machine learning based assessment framework for early ASD screening. By integrating speech recognition and computer vision technologies, the proposed framework can quantitatively analyze children's behaviors under well-designed protocols. We collect 196 stimulus samples from 41 children with an average age of 23.34 months, and the proposed method obtains 85.20% accuracy for predicting stimulus scores and 80.49% accuracy for the final ASD prediction. This result indicates that our model approaches the average level of domain experts in this “Responsive Social Smile” protocol.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"195 1","pages":"2240-2247"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75889952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition 基于笔画的在线手写数学表达式识别后验注意
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412790
Chang Jie Wu, Qing Wang, Jianshu Zhang, Jun Du, Jiaming Wang, Jiajia Wu, Jinshui Hu
Recently, many researches propose to employ attention based encoder-decoder models to convert a sequence of trajectory points into a LaTeX string for online handwritten mathematical expression recognition (OHMER), and the recognition performance of these models critically relies on the accuracy of the attention. In this paper, unlike previous methods which basically employ a soft attention model, we propose to employ a posterior attention model, which modifies the attention probabilities after observing the output probabilities generated by the soft attention model. In order to further improve the posterior attention mechanism, we propose a stroke average pooling layer to aggregate point-level features obtained from the encoder into stroke-level features. We argue that posterior attention is better to be implemented on stroke-level features than point-level features as the output probabilities generated by stroke is more convincing than generated by point, and we prove that through experimental analysis. Validated on the CROHME competition task, we demonstrate that stroke based posterior attention achieves expression recognition rates of 54.26% on CROHME 2014 and 51.75% on CROHME 2016. According to attention visualization analysis, we empirically demonstrate that the posterior attention mechanism can achieve better alignment accuracy than the soft attention mechanism.
近年来,许多研究提出采用基于注意力的编码器-解码器模型将轨迹点序列转换为LaTeX字符串用于在线手写数学表达式识别(OHMER),这些模型的识别性能严重依赖于注意力的准确性。与以往的方法基本采用软注意模型不同,本文提出采用后置注意模型,即在观察软注意模型产生的输出概率后对注意概率进行修正。为了进一步改进后验注意机制,我们提出了一个笔划平均池化层,将编码器获得的点级特征聚合为笔划级特征。我们认为,由于笔画生成的输出概率比点生成的输出概率更有说服力,后验注意更适合用于笔画级特征而不是点级特征,并通过实验分析证明了这一点。通过对CROHME竞争任务的验证,我们发现基于卒中的后验注意在CROHME 2014和CROHME 2016上的表情识别率分别为54.26%和51.75%。根据注意可视化分析,我们实证证明后向注意机制比软注意机制能达到更好的对齐精度。
{"title":"Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition","authors":"Chang Jie Wu, Qing Wang, Jianshu Zhang, Jun Du, Jiaming Wang, Jiajia Wu, Jinshui Hu","doi":"10.1109/ICPR48806.2021.9412790","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412790","url":null,"abstract":"Recently, many researches propose to employ attention based encoder-decoder models to convert a sequence of trajectory points into a LaTeX string for online handwritten mathematical expression recognition (OHMER), and the recognition performance of these models critically relies on the accuracy of the attention. In this paper, unlike previous methods which basically employ a soft attention model, we propose to employ a posterior attention model, which modifies the attention probabilities after observing the output probabilities generated by the soft attention model. In order to further improve the posterior attention mechanism, we propose a stroke average pooling layer to aggregate point-level features obtained from the encoder into stroke-level features. We argue that posterior attention is better to be implemented on stroke-level features than point-level features as the output probabilities generated by stroke is more convincing than generated by point, and we prove that through experimental analysis. Validated on the CROHME competition task, we demonstrate that stroke based posterior attention achieves expression recognition rates of 54.26% on CROHME 2014 and 51.75% on CROHME 2016. According to attention visualization analysis, we empirically demonstrate that the posterior attention mechanism can achieve better alignment accuracy than the soft attention mechanism.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"34 1","pages":"2943-2949"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75084647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VPU Specific CNNs through Neural Architecture Search 基于神经结构搜索的VPU特定cnn
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412794
Ciarán Donegan, H. Yous, Saksham Sinha, Jonathan Byrne
The success of deep learning at computer vision tasks has led to an ever-increasing number of applications on edge devices. Often with the use of edge AI hardware accelerators like the Intel Movidius Vision Processing Unit (VPU). Performing computer vision tasks on edge devices is challenging. Many Convolutional Neural Networks (CNNs) are too complex to run on edge devices with limited computing power. This has created large interest in designing efficient CNNs and one promising way of doing this is through Neural Architecture Search (NAS). NAS aims to automate the design of neural networks. NAS can also optimize multiple different objectives together, like accuracy and efficiency, which is difficult for humans. In this paper, we use a differentiable NAS method to find efficient CNNs for VPU that achieves state-of-the-art classification accuracy on ImageNet. Our NAS designed model outperforms MobileNetV2, having almost 1% higher top-1 accuracy while being 13% faster on MyriadX VPU. To the best of our knowledge, this is the first time a VPU specific CNN has been designed using a NAS algorithm. Our results also reiterate the fact that efficient networks must be designed for each specific hardware. We show that efficient networks targeted at different devices do not perform as well on the VPU.
深度学习在计算机视觉任务中的成功导致边缘设备上的应用程序数量不断增加。通常使用边缘人工智能硬件加速器,如英特尔Movidius视觉处理单元(VPU)。在边缘设备上执行计算机视觉任务具有挑战性。许多卷积神经网络(cnn)过于复杂,无法在计算能力有限的边缘设备上运行。这引起了人们对设计高效cnn的极大兴趣,其中一种有前途的方法是通过神经结构搜索(NAS)。NAS旨在实现神经网络设计的自动化。NAS还可以同时优化多个不同的目标,比如准确性和效率,这是人类很难做到的。在本文中,我们使用可微NAS方法为VPU找到有效的cnn,在ImageNet上达到最先进的分类精度。我们的NAS设计模型优于MobileNetV2,在MyriadX VPU上的top-1准确率提高了近1%,而速度提高了13%。据我们所知,这是第一次使用NAS算法设计VPU专用CNN。我们的结果还重申了一个事实,即必须为每个特定的硬件设计高效的网络。我们表明,针对不同设备的高效网络在VPU上的表现并不好。
{"title":"VPU Specific CNNs through Neural Architecture Search","authors":"Ciarán Donegan, H. Yous, Saksham Sinha, Jonathan Byrne","doi":"10.1109/ICPR48806.2021.9412794","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412794","url":null,"abstract":"The success of deep learning at computer vision tasks has led to an ever-increasing number of applications on edge devices. Often with the use of edge AI hardware accelerators like the Intel Movidius Vision Processing Unit (VPU). Performing computer vision tasks on edge devices is challenging. Many Convolutional Neural Networks (CNNs) are too complex to run on edge devices with limited computing power. This has created large interest in designing efficient CNNs and one promising way of doing this is through Neural Architecture Search (NAS). NAS aims to automate the design of neural networks. NAS can also optimize multiple different objectives together, like accuracy and efficiency, which is difficult for humans. In this paper, we use a differentiable NAS method to find efficient CNNs for VPU that achieves state-of-the-art classification accuracy on ImageNet. Our NAS designed model outperforms MobileNetV2, having almost 1% higher top-1 accuracy while being 13% faster on MyriadX VPU. To the best of our knowledge, this is the first time a VPU specific CNN has been designed using a NAS algorithm. Our results also reiterate the fact that efficient networks must be designed for each specific hardware. We show that efficient networks targeted at different devices do not perform as well on the VPU.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"385 1","pages":"9772-9779"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75138058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2020 25th International Conference on Pattern Recognition (ICPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1