首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Generating neural architectures from parameter spaces for multi-agent reinforcement learning 从多代理强化学习的参数空间生成神经架构
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-01 DOI: 10.1016/j.patrec.2024.07.013
Corentin Artaud, Varuna De-Silva, Rafael Pina, Xiyu Shi

We explore a data-driven approach to generating neural network parameters to determine whether generative models can capture the underlying distribution of a collection of neural network checkpoints. We compile a dataset of checkpoints from neural networks trained within the multi-agent reinforcement learning framework, thus potentially producing previously unseen combinations of neural network parameters. In particular, our generative model is a conditional transformer-based variational autoencoder that, when provided with random noise and a specified performance metric – in our context, returns – predicts the appropriate distribution over the parameter space to achieve the desired performance metric. Our method successfully generates parameters for a specified optimal return without further fine-tuning. We also show that the parameters generated using this approach are more constrained and less variable and, most importantly, perform on par with those trained directly under the multi-agent reinforcement learning framework. We test our method on the neural network architectures commonly employed in the most advanced state-of-the-art algorithms.

我们探索了一种数据驱动的神经网络参数生成方法,以确定生成模型能否捕捉神经网络检查点集合的基本分布。我们汇编了在多代理强化学习框架内训练的神经网络的检查点数据集,从而有可能产生以前从未见过的神经网络参数组合。特别是,我们的生成模型是一个基于条件变换器的变分自动编码器,当提供随机噪声和一个指定的性能指标(在我们的语境中是回报率)时,它会预测参数空间的适当分布,以实现所需的性能指标。我们的方法无需进一步微调,就能成功生成指定最优回报率的参数。我们还表明,使用这种方法生成的参数约束性更强、可变性更小,最重要的是,其性能与在多代理强化学习框架下直接训练的参数相当。我们在最先进算法中常用的神经网络架构上测试了我们的方法。
{"title":"Generating neural architectures from parameter spaces for multi-agent reinforcement learning","authors":"Corentin Artaud,&nbsp;Varuna De-Silva,&nbsp;Rafael Pina,&nbsp;Xiyu Shi","doi":"10.1016/j.patrec.2024.07.013","DOIUrl":"10.1016/j.patrec.2024.07.013","url":null,"abstract":"<div><p>We explore a data-driven approach to generating neural network parameters to determine whether generative models can capture the underlying distribution of a collection of neural network checkpoints. We compile a dataset of checkpoints from neural networks trained within the multi-agent reinforcement learning framework, thus potentially producing previously unseen combinations of neural network parameters. In particular, our generative model is a conditional transformer-based variational autoencoder that, when provided with random noise and a specified performance metric – in our context, <em>returns</em> – predicts the appropriate distribution over the parameter space to achieve the desired performance metric. Our method successfully generates parameters for a specified optimal return without further fine-tuning. We also show that the parameters generated using this approach are more constrained and less variable and, most importantly, perform on par with those trained directly under the multi-agent reinforcement learning framework. We test our method on the neural network architectures commonly employed in the most advanced state-of-the-art algorithms.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 272-278"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002162/pdfft?md5=9d36e1cb3980d40cb66497131a82ff52&pid=1-s2.0-S0167865524002162-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141845216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An unsupervised video anomaly detection method via Optical Flow decomposition and Spatio-Temporal feature learning 通过光流分解和时空特征学习的无监督视频异常检测方法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-01 DOI: 10.1016/j.patrec.2024.08.013
Jin Fan , Yuxiang Ji , Huifeng Wu , Yan Ge , Danfeng Sun , Jia Wu

The purpose of this paper is to present an unsupervised video anomaly detection method using Optical Flow decomposition and Spatio-Temporal feature learning (OFST). This method employs a combination of optical flow reconstruction and video frame prediction to achieve satisfactory results. The proposed OFST framework is composed of two modules: the Multi-Granularity Memory-augmented Autoencoder with Optical Flow Decomposition (MG-MemAE-OFD) and a Two-Stream Network based on Spatio-Temporal feature learning (TSN-ST). The MG-MemAE-OFD module is composed of three functional blocks: optical flow decomposition, autoencoder, and multi-granularity memory networks. The optical flow decomposition block is used to extract the main motion information of objects in optical flow, and the granularity memory network is utilized to memorize normal patterns and improve the quality of the reconstructions. To predict video frames, we introduce a two-stream network based on spatiotemporal feature learning (TSN-ST), which adopts parallel standard Transformer blocks and a temporal block to learn spatiotemporal features from video frames and optical flows. The OFST combines these two modules so that the prediction error of abnormal samples is further increased due to the larger reconstruction error. In contrast, the normal samples obtain a lower reconstruction error and prediction error. Therefore, the anomaly detection capability of the method is greatly enhanced. Our proposed model was evaluated on public datasets. Specifically, in terms of the area under the curve (AUC), our model achieved an accuracy of 85.74% on the Ped1 dataset, 99.62% on the Ped2 dataset, 93.89% on the Avenue dataset, and 76.0% on the ShanghaiTech Dataset. Our experimental results show an average improvement of 1.2% compared to the current state-of-the-art.

本文旨在介绍一种使用光流分解和时空特征学习(OFST)的无监督视频异常检测方法。该方法将光流重构和视频帧预测相结合,取得了令人满意的效果。所提出的 OFST 框架由两个模块组成:具有光流分解功能的多粒度内存增强自动编码器(MG-MemAE-OFD)和基于时空特征学习的双流网络(TSN-ST)。MG-MemAE-OFD 模块由三个功能模块组成:光流分解、自动编码器和多粒度存储网络。光流分解模块用于提取光流中物体的主要运动信息,粒度记忆网络用于记忆正常模式并提高重建质量。为了预测视频帧,我们引入了基于时空特征学习的双流网络(TSN-ST),它采用并行的标准变换器块和时序块,从视频帧和光流中学习时空特征。OFST 将这两个模块结合在一起,由于重建误差较大,异常样本的预测误差会进一步增大。相比之下,正常样本的重构误差和预测误差较小。因此,该方法的异常检测能力大大增强。我们提出的模型在公共数据集上进行了评估。具体来说,就曲线下面积(AUC)而言,我们的模型在 Ped1 数据集上的准确率为 85.74%,在 Ped2 数据集上的准确率为 99.62%,在 Avenue 数据集上的准确率为 93.89%,在 ShanghaiTech 数据集上的准确率为 76.0%。实验结果表明,与目前最先进的技术相比,我们的模型平均提高了 1.2%。
{"title":"An unsupervised video anomaly detection method via Optical Flow decomposition and Spatio-Temporal feature learning","authors":"Jin Fan ,&nbsp;Yuxiang Ji ,&nbsp;Huifeng Wu ,&nbsp;Yan Ge ,&nbsp;Danfeng Sun ,&nbsp;Jia Wu","doi":"10.1016/j.patrec.2024.08.013","DOIUrl":"10.1016/j.patrec.2024.08.013","url":null,"abstract":"<div><p>The purpose of this paper is to present an unsupervised video anomaly detection method using Optical Flow decomposition and Spatio-Temporal feature learning (OFST). This method employs a combination of optical flow reconstruction and video frame prediction to achieve satisfactory results. The proposed OFST framework is composed of two modules: the Multi-Granularity Memory-augmented Autoencoder with Optical Flow Decomposition (MG-MemAE-OFD) and a Two-Stream Network based on Spatio-Temporal feature learning (TSN-ST). The MG-MemAE-OFD module is composed of three functional blocks: optical flow decomposition, autoencoder, and multi-granularity memory networks. The optical flow decomposition block is used to extract the main motion information of objects in optical flow, and the granularity memory network is utilized to memorize normal patterns and improve the quality of the reconstructions. To predict video frames, we introduce a two-stream network based on spatiotemporal feature learning (TSN-ST), which adopts parallel standard Transformer blocks and a temporal block to learn spatiotemporal features from video frames and optical flows. The OFST combines these two modules so that the prediction error of abnormal samples is further increased due to the larger reconstruction error. In contrast, the normal samples obtain a lower reconstruction error and prediction error. Therefore, the anomaly detection capability of the method is greatly enhanced. Our proposed model was evaluated on public datasets. Specifically, in terms of the area under the curve (AUC), our model achieved an accuracy of 85.74% on the Ped1 dataset, 99.62% on the Ped2 dataset, 93.89% on the Avenue dataset, and 76.0% on the ShanghaiTech Dataset. Our experimental results show an average improvement of 1.2% compared to the current state-of-the-art.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 239-246"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent Advances in Deep Learning Model Security 深度学习模型安全性的最新进展
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-01 DOI: 10.1016/j.patrec.2024.08.018
Guorui Feng , Sheng Li , Jian Zhao , Zheng Wang
{"title":"Recent Advances in Deep Learning Model Security","authors":"Guorui Feng ,&nbsp;Sheng Li ,&nbsp;Jian Zhao ,&nbsp;Zheng Wang","doi":"10.1016/j.patrec.2024.08.018","DOIUrl":"10.1016/j.patrec.2024.08.018","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 262-263"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive representation enhancement and learning for handwritten mathematical expression recognition 手写数学表达式识别的对比表示增强和学习
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-30 DOI: 10.1016/j.patrec.2024.08.021
Zihao Lin , Jinrong Li , Gang Dai , Tianshui Chen , Shuangping Huang , Jianmin Lin

Handwritten mathematical expression recognition (HMER) is an appealing task due to its wide applications and research challenges. Previous deep learning-based methods used string decoder to emphasize on expression symbol awareness and achieved considerable recognition performance. However, these methods still meet an obstacle in recognizing handwritten symbols with varying appearance, in which huge appearance variations significantly lead to the ambiguity of symbol representation. To this end, our intuition is to employ printed expressions with unified appearance to serve as the template of handwritten expressions, alleviating the effects brought by varying symbol appearance. In this paper, we propose a contrastive learning method, where handwritten symbols with identical semantic are clustered together through the guidance of printed symbols, leading model to enhance the robustness of symbol semantic representations. Specifically, we propose an anchor generation scheme to obtain printed expression images corresponding with handwritten expressions. We propose a contrastive learning objective, termed Semantic-NCE Loss, to pull together printed and handwritten symbols with identical semantic. Moreover, we employ a string decoder to parse the calibrated semantic representations, outputting satisfactory expression symbols. The experiment results on benchmark datasets CROHME 14/16/19 demonstrate that our method noticeably improves recognition accuracy of handwritten expressions and outperforms the standard string decoder methods.

手写数学表达式识别(HMER)因其广泛的应用和研究挑战而成为一项极具吸引力的任务。以往基于深度学习的方法使用字符串解码器来强调表达符号感知,并取得了可观的识别性能。然而,这些方法在识别具有不同外观的手写符号时仍会遇到障碍,其中巨大的外观变化会显著导致符号表示的模糊性。为此,我们的直觉是采用具有统一外观的印刷表达作为手写表达的模板,以减轻符号外观变化带来的影响。在本文中,我们提出了一种对比学习方法,即通过印刷符号的引导,将语义相同的手写符号聚类在一起,从而引导模型增强符号语义表征的鲁棒性。具体来说,我们提出了一种锚生成方案,以获得与手写表情相对应的印刷表情图像。我们提出了一种对比学习目标(称为语义-NCE损失),将具有相同语义的印刷符号和手写符号放在一起。此外,我们还采用了字符串解码器来解析校准后的语义表示,从而输出令人满意的表情符号。在基准数据集 CROHME 14/16/19 上的实验结果表明,我们的方法明显提高了手写表情的识别准确率,并优于标准字符串解码器方法。
{"title":"Contrastive representation enhancement and learning for handwritten mathematical expression recognition","authors":"Zihao Lin ,&nbsp;Jinrong Li ,&nbsp;Gang Dai ,&nbsp;Tianshui Chen ,&nbsp;Shuangping Huang ,&nbsp;Jianmin Lin","doi":"10.1016/j.patrec.2024.08.021","DOIUrl":"10.1016/j.patrec.2024.08.021","url":null,"abstract":"<div><p>Handwritten mathematical expression recognition (HMER) is an appealing task due to its wide applications and research challenges. Previous deep learning-based methods used string decoder to emphasize on expression symbol awareness and achieved considerable recognition performance. However, these methods still meet an obstacle in recognizing handwritten symbols with varying appearance, in which huge appearance variations significantly lead to the ambiguity of symbol representation. To this end, our intuition is to employ printed expressions with unified appearance to serve as the template of handwritten expressions, alleviating the effects brought by varying symbol appearance. In this paper, we propose a contrastive learning method, where handwritten symbols with identical semantic are clustered together through the guidance of printed symbols, leading model to enhance the robustness of symbol semantic representations. Specifically, we propose an anchor generation scheme to obtain printed expression images corresponding with handwritten expressions. We propose a contrastive learning objective, termed Semantic-NCE Loss, to pull together printed and handwritten symbols with identical semantic. Moreover, we employ a string decoder to parse the calibrated semantic representations, outputting satisfactory expression symbols. The experiment results on benchmark datasets CROHME 14/16/19 demonstrate that our method noticeably improves recognition accuracy of handwritten expressions and outperforms the standard string decoder methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 14-20"},"PeriodicalIF":3.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polynomial kernel learning for interpolation kernel machines with application to graph classification 应用于图形分类的插值内核机器的多项式内核学习
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-30 DOI: 10.1016/j.patrec.2024.08.022
Jiaqi Zhang , Cheng-Lin Liu , Xiaoyi Jiang

Since all training data is interpolated, interpolating classifiers have zero training error. However, recent work provides compelling reasons to investigate these classifiers, including their significance for ensemble methods. Interpolation kernel machines, which belong to the class of interpolating classifiers, are capable of good generalization and have proven to be an effective substitute for support vector machines, particularly for graph classification. In this work, we further enhance their performance by studying multiple kernel learning. To this end, we propose a general scheme of polynomial combined kernel functions, employing both quadratic and cubic kernel combinations in our experimental work. Our findings demonstrate that this approach improves performance compared to individual graph kernels. Our work supports the use of interpolation kernel machines as an alternative to support vector machines, thereby contributing to greater methodological diversity.

由于所有训练数据都是内插的,因此内插分类器的训练误差为零。然而,最近的研究为研究这些分类器提供了令人信服的理由,包括它们对集合方法的重要性。插值内核机属于插值分类器,具有良好的泛化能力,已被证明可以有效替代支持向量机,尤其是在图分类方面。在这项工作中,我们通过研究多核学习进一步提高了它们的性能。为此,我们提出了一种多项式组合核函数的通用方案,并在实验工作中采用了二次核和三次核组合。我们的研究结果表明,与单个图形内核相比,这种方法提高了性能。我们的工作支持使用插值内核机来替代支持向量机,从而促进方法的多样性。
{"title":"Polynomial kernel learning for interpolation kernel machines with application to graph classification","authors":"Jiaqi Zhang ,&nbsp;Cheng-Lin Liu ,&nbsp;Xiaoyi Jiang","doi":"10.1016/j.patrec.2024.08.022","DOIUrl":"10.1016/j.patrec.2024.08.022","url":null,"abstract":"<div><p>Since all training data is interpolated, interpolating classifiers have zero training error. However, recent work provides compelling reasons to investigate these classifiers, including their significance for ensemble methods. Interpolation kernel machines, which belong to the class of interpolating classifiers, are capable of good generalization and have proven to be an effective substitute for support vector machines, particularly for graph classification. In this work, we further enhance their performance by studying multiple kernel learning. To this end, we propose a general scheme of polynomial combined kernel functions, employing both quadratic and cubic kernel combinations in our experimental work. Our findings demonstrate that this approach improves performance compared to individual graph kernels. Our work supports the use of interpolation kernel machines as an alternative to support vector machines, thereby contributing to greater methodological diversity.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 7-13"},"PeriodicalIF":3.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016786552400254X/pdfft?md5=19d4b401347029bc4e40d7a753b1f93a&pid=1-s2.0-S016786552400254X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-attention based dual-similarity network for few-shot learning 基于交叉注意力的双相似性网络,用于少量学习
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-28 DOI: 10.1016/j.patrec.2024.08.019
Chan Sim, Gyeonghwan Kim

Few-shot classification is a challenging task to recognize unseen classes with limited data. Following the success of Vision Transformer in various large-scale datasets image recognition domains, recent few-shot classification methods employ transformer-style. However, most of them focus only on cross-attention between support and query sets, mainly considering channel-similarity. To address this issue, we introduce dual-similarity network (DSN) in which attention maps for the same target within a class are made identical. With the network, a way of effective training through the integration of the channel-similarity and the map-similarity has been sought. Our method, while focused on N-way K-shot scenarios, also demonstrates strong performance in 1-shot settings through augmentation. The experimental results verify the effectiveness of DSN on widely used benchmark datasets.

少镜头分类是一项具有挑战性的任务,需要利用有限的数据识别未见类别。随着 Vision Transformer 在各种大规模数据集图像识别领域的成功应用,近期的少量分类方法也采用了 Transformer 风格。然而,这些方法大多只关注支持集和查询集之间的交叉关注,主要考虑通道相似性。为了解决这个问题,我们引入了双相似性网络(DSN)。通过该网络,我们找到了一种整合通道相似性和地图相似性的有效训练方法。我们的方法虽然侧重于 N 路 K 次搜索,但通过增强,在 1 次搜索的情况下也能表现出很强的性能。实验结果验证了 DSN 在广泛使用的基准数据集上的有效性。
{"title":"Cross-attention based dual-similarity network for few-shot learning","authors":"Chan Sim,&nbsp;Gyeonghwan Kim","doi":"10.1016/j.patrec.2024.08.019","DOIUrl":"10.1016/j.patrec.2024.08.019","url":null,"abstract":"<div><p>Few-shot classification is a challenging task to recognize unseen classes with limited data. Following the success of Vision Transformer in various large-scale datasets image recognition domains, recent few-shot classification methods employ transformer-style. However, most of them focus only on cross-attention between support and query sets, mainly considering channel-similarity. To address this issue, we introduce <em>dual-similarity network</em> (DSN) in which attention maps for the same target within a class are made identical. With the network, a way of effective training through the integration of the channel-similarity and the map-similarity has been sought. Our method, while focused on <span><math><mi>N</mi></math></span>-way <span><math><mi>K</mi></math></span>-shot scenarios, also demonstrates strong performance in 1-shot settings through augmentation. The experimental results verify the effectiveness of DSN on widely used benchmark datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 1-6"},"PeriodicalIF":3.9,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scale-aware token-matching for transformer-based object detector 基于变换器的对象检测器的规模感知标记匹配
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-23 DOI: 10.1016/j.patrec.2024.08.006
Aecheon Jung , Sungeun Hong , Yoonsuk Hyun

Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.

由于深度学习的进步,物体检测在估计图像中多个物体的位置和类别方面取得了重大进展。然而,在单幅图像中检测不同尺度的物体仍然是一个具有挑战性的问题。在本研究中,我们提出了一种尺度感知标记匹配方法,用于预测基于变换器的物体检测中物体的位置和类别。与以往在训练过程中不考虑尺度而进行匹配的方法不同,我们通过将检测标记与地面实况进行匹配来训练模型。我们根据尺度将一个检测标记集分为多个标记集,并将每个标记集与地面实况进行不同的匹配,从而在不增加额外计算成本的情况下训练模型。实验结果表明,尺度信息可以分配给标记。尺度感知标记可以通过使用新颖的损失函数独立学习特定尺度信息,从而提高对小物体的检测性能。
{"title":"Scale-aware token-matching for transformer-based object detector","authors":"Aecheon Jung ,&nbsp;Sungeun Hong ,&nbsp;Yoonsuk Hyun","doi":"10.1016/j.patrec.2024.08.006","DOIUrl":"10.1016/j.patrec.2024.08.006","url":null,"abstract":"<div><p>Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 197-202"},"PeriodicalIF":3.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002381/pdfft?md5=455cf43c88bbb69d1fdd489f7d4c3fe2&pid=1-s2.0-S0167865524002381-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coding self-representative and label-relaxed hashing for cross-modal retrieval 用于跨模态检索的自代表和标签宽松散列编码
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-23 DOI: 10.1016/j.patrec.2024.08.011
Lin Jiang , Jigang Wu , Shuping Zhao , Jiaxing Li

In cross-modal retrieval, most existing hashing-based methods merely considered the relationship between feature representations to reduce the heterogeneous gap for data from various modalities, whereas they neglected the correlation between feature representations and the corresponding labels. This leads to the loss of significant semantic information, and the degradation of the class discriminability of the model. To tackle these issues, this paper presents a novel cross-modal retrieval method called coding self-representative and label-relaxed hashing (CSLRH) for cross-modal retrieval. Specifically, we propose a self-representation learning term to enhance the class-specific feature representations and reduce the noise interference. Additionally, we introduce a label-relaxed regression to establish semantic relations between the hash codes and the label information, aiming to enhance the semantic discriminability. Moreover, we incorporate a non-linear regression to capture the correlation of non-linear features in hash codes for cross-modal retrieval. Experimental results on three widely-used datasets verify the effectiveness of our proposed method, which can generate more discriminative hash codes to improve the precisions of cross-modal retrieval.

在跨模态检索中,大多数现有的基于散列的方法仅仅考虑了特征表征之间的关系,以减少来自不同模态数据的异质性差距,而忽略了特征表征与相应标签之间的相关性。这就导致了重要语义信息的丢失,以及模型类别区分度的降低。为了解决这些问题,本文提出了一种新型的跨模态检索方法,即用于跨模态检索的编码自表示和标签松散散列(CSLRH)。具体来说,我们提出了一种自代表学习项,以增强特定类别的特征表示并减少噪声干扰。此外,我们还引入了标签松弛回归,以建立哈希代码与标签信息之间的语义关系,从而提高语义可辨别性。此外,我们还加入了非线性回归,以捕捉哈希代码中非线性特征的相关性,从而实现跨模态检索。在三个广泛使用的数据集上的实验结果验证了我们提出的方法的有效性,该方法可以生成更具区分度的哈希代码,从而提高跨模态检索的精确度。
{"title":"Coding self-representative and label-relaxed hashing for cross-modal retrieval","authors":"Lin Jiang ,&nbsp;Jigang Wu ,&nbsp;Shuping Zhao ,&nbsp;Jiaxing Li","doi":"10.1016/j.patrec.2024.08.011","DOIUrl":"10.1016/j.patrec.2024.08.011","url":null,"abstract":"<div><p>In cross-modal retrieval, most existing hashing-based methods merely considered the relationship between feature representations to reduce the heterogeneous gap for data from various modalities, whereas they neglected the correlation between feature representations and the corresponding labels. This leads to the loss of significant semantic information, and the degradation of the class discriminability of the model. To tackle these issues, this paper presents a novel cross-modal retrieval method called coding self-representative and label-relaxed hashing (CSLRH) for cross-modal retrieval. Specifically, we propose a self-representation learning term to enhance the class-specific feature representations and reduce the noise interference. Additionally, we introduce a label-relaxed regression to establish semantic relations between the hash codes and the label information, aiming to enhance the semantic discriminability. Moreover, we incorporate a non-linear regression to capture the correlation of non-linear features in hash codes for cross-modal retrieval. Experimental results on three widely-used datasets verify the effectiveness of our proposed method, which can generate more discriminative hash codes to improve the precisions of cross-modal retrieval.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contraction mapping of feature norms for data quality imbalance learning 用于数据质量不平衡学习的特征规范收缩映射
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-23 DOI: 10.1016/j.patrec.2024.08.016
Weihua Liu , Xiabi Liu , Huiyu Li , Chaochao Lin

The popular softmax loss and its recent extensions have achieved great success in deep learning-based image classification. However, the data for training image classifiers often exhibit a highly skewed distribution in quality, i.e., the number of data with good quality is much more than that with low quality. If this problem is ignored, low-quality data are hard to classify correctly. In this paper, we discover the positive correlation between the quality of an image and its feature norm (L2-norm) learned from softmax loss through careful experiments on various applications with different deep neural networks. Based on this finding, we propose a contraction mapping function to compress the range of feature norms of training images according to their quality and embed this contraction mapping function into softmax loss and its extensions to produce novel learning objectives. Experiments on various applications, including handwritten digit recognition, lung nodule classification, and face recognition, demonstrate that the proposed approach is promising to effectively deal with the problem of learning quality imbalance data and leads to significant and stable improvements in the classification accuracy. The code is available at https://github.com/Huiyu-Li/CM-M-Softmax-Loss.

在基于深度学习的图像分类中,流行的 softmax 损失及其最近的扩展取得了巨大成功。然而,用于训练图像分类器的数据在质量上往往呈现高度倾斜分布,即质量好的数据数量远远多于质量差的数据数量。如果忽略这个问题,低质量数据就很难被正确分类。在本文中,我们通过使用不同的深度神经网络对各种应用进行仔细实验,发现了图像质量与通过 softmax loss 学习到的特征规范(L2-norm)之间的正相关性。基于这一发现,我们提出了一种收缩映射函数,用于根据图像质量压缩训练图像的特征规范范围,并将这种收缩映射函数嵌入到 softmax loss 及其扩展中,以产生新的学习目标。在手写数字识别、肺结节分类和人脸识别等各种应用上的实验表明,所提出的方法有望有效地解决学习质量不平衡数据的问题,并能显著而稳定地提高分类准确率。代码见 https://github.com/Huiyu-Li/CM-M-Softmax-Loss。
{"title":"Contraction mapping of feature norms for data quality imbalance learning","authors":"Weihua Liu ,&nbsp;Xiabi Liu ,&nbsp;Huiyu Li ,&nbsp;Chaochao Lin","doi":"10.1016/j.patrec.2024.08.016","DOIUrl":"10.1016/j.patrec.2024.08.016","url":null,"abstract":"<div><p>The popular softmax loss and its recent extensions have achieved great success in deep learning-based image classification. However, the data for training image classifiers often exhibit a highly skewed distribution in quality, i.e., the number of data with good quality is much more than that with low quality. If this problem is ignored, low-quality data are hard to classify correctly. In this paper, we discover the positive correlation between the quality of an image and its feature norm (<span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm) learned from softmax loss through careful experiments on various applications with different deep neural networks. Based on this finding, we propose a contraction mapping function to compress the range of feature norms of training images according to their quality and embed this contraction mapping function into softmax loss and its extensions to produce novel learning objectives. Experiments on various applications, including handwritten digit recognition, lung nodule classification, and face recognition, demonstrate that the proposed approach is promising to effectively deal with the problem of learning quality imbalance data and leads to significant and stable improvements in the classification accuracy. The code is available at <span><span>https://github.com/Huiyu-Li/CM-M-Softmax-Loss</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 232-238"},"PeriodicalIF":3.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-learning from learning curves for budget-limited algorithm selection 根据学习曲线进行元学习,以选择预算有限的算法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.010
Manh Hung Nguyen , Lisheng Sun Hosoya , Isabelle Guyon

Training a large set of machine learning algorithms to convergence in order to select the best-performing algorithm for a dataset is computationally wasteful. Moreover, in a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it, ensuring that the limited budget is optimally distributed to favor the most promising candidates. Casting this problem as a Markov Decision Process, we propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained. At each time step, given an observation of partial learning curves of algorithms, the agent must decide whether to allocate resources to further train the most promising algorithm (exploitation), to wake up another algorithm previously put to sleep, or to start training a new algorithm (exploration). In addition, our framework allows the agent to meta-learn from learning curves on past datasets along with dataset meta-features and algorithm hyperparameters. By incorporating meta-learning, we aim to avoid myopic decisions based solely on premature learning curves on the dataset at hand. We introduce two benchmarks of learning curves that served in international competitions at WCCI’22 and AutoML-conf’22, of which we analyze the results. Our findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and our DDQN baseline, compared to heuristic baselines or a random search. Interestingly, our cost-effective baseline, which selects the best-performing algorithm w.r.t. a small budget, can perform decently when learning curves do not intersect frequently.

为了为一个数据集选择性能最佳的算法而训练一大套机器学习算法直到收敛,这在计算上是一种浪费。此外,在预算有限的情况下,仔细选择候选算法并为其训练分配预算至关重要,这样才能确保有限的预算得到最佳分配,从而有利于最有前途的候选算法。我们将这一问题视为马尔可夫决策过程,提出了一个新颖的框架,在该框架中,代理必须在学习过程中选择最有前途的算法,而无需等到算法完全训练完成。在每个时间步骤中,给定对算法部分学习曲线的观察结果,代理必须决定是分配资源进一步训练最有前途的算法(开发),还是唤醒之前休眠的另一种算法,或者开始训练一种新算法(探索)。此外,我们的框架允许代理从过去数据集的学习曲线以及数据集元特征和算法超参数中进行元学习。通过元学习,我们旨在避免仅根据手头数据集的过早学习曲线做出近视决策。我们介绍了在 WCCI'22 和 AutoML-conf'22 国际竞赛中使用的两个学习曲线基准,并对其结果进行了分析。我们的研究结果表明,与启发式基线或随机搜索相比,元学习和学习曲线的进步都能增强算法选择过程,这一点可以从获胜团队的方法和我们的 DDQN 基线中得到证明。有趣的是,当学习曲线不经常相交时,我们的成本效益基线(在预算较少的情况下选择表现最佳的算法)也能表现出色。
{"title":"Meta-learning from learning curves for budget-limited algorithm selection","authors":"Manh Hung Nguyen ,&nbsp;Lisheng Sun Hosoya ,&nbsp;Isabelle Guyon","doi":"10.1016/j.patrec.2024.08.010","DOIUrl":"10.1016/j.patrec.2024.08.010","url":null,"abstract":"<div><p>Training a large set of machine learning algorithms to convergence in order to select the best-performing algorithm for a dataset is computationally wasteful. Moreover, in a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it, ensuring that the limited budget is optimally distributed to favor the most promising candidates. Casting this problem as a Markov Decision Process, we propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained. At each time step, given an observation of partial learning curves of algorithms, the agent must decide whether to allocate resources to further train the most promising algorithm (exploitation), to wake up another algorithm previously put to sleep, or to start training a new algorithm (exploration). In addition, our framework allows the agent to meta-learn from learning curves on past datasets along with dataset meta-features and algorithm hyperparameters. By incorporating meta-learning, we aim to avoid myopic decisions based solely on premature learning curves on the dataset at hand. We introduce two benchmarks of learning curves that served in international competitions at WCCI’22 and AutoML-conf’22, of which we analyze the results. Our findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and our DDQN baseline, compared to heuristic baselines or a random search. Interestingly, our cost-effective baseline, which selects the best-performing algorithm w.r.t. a small budget, can perform decently when learning curves do not intersect frequently.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 225-231"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1