Pattern Recognition Letters最新文献_第9页

Cross-attention based dual-similarity network for few-shot learning 基于交叉注意力的双相似性网络，用于少量学习

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-28 DOI: 10.1016/j.patrec.2024.08.019

Chan Sim, Gyeonghwan Kim

Few-shot classification is a challenging task to recognize unseen classes with limited data. Following the success of Vision Transformer in various large-scale datasets image recognition domains, recent few-shot classification methods employ transformer-style. However, most of them focus only on cross-attention between support and query sets, mainly considering channel-similarity. To address this issue, we introduce dual-similarity network (DSN) in which attention maps for the same target within a class are made identical. With the network, a way of effective training through the integration of the channel-similarity and the map-similarity has been sought. Our method, while focused on $N$ -way $K$ -shot scenarios, also demonstrates strong performance in 1-shot settings through augmentation. The experimental results verify the effectiveness of DSN on widely used benchmark datasets.

少镜头分类是一项具有挑战性的任务，需要利用有限的数据识别未见类别。随着 Vision Transformer 在各种大规模数据集图像识别领域的成功应用，近期的少量分类方法也采用了 Transformer 风格。然而，这些方法大多只关注支持集和查询集之间的交叉关注，主要考虑通道相似性。为了解决这个问题，我们引入了双相似性网络（DSN）。通过该网络，我们找到了一种整合通道相似性和地图相似性的有效训练方法。我们的方法虽然侧重于 N 路 K 次搜索，但通过增强，在 1 次搜索的情况下也能表现出很强的性能。实验结果验证了 DSN 在广泛使用的基准数据集上的有效性。

引用次数: 0

Scale-aware token-matching for transformer-based object detector 基于变换器的对象检测器的规模感知标记匹配

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-23 DOI: 10.1016/j.patrec.2024.08.006

Aecheon Jung , Sungeun Hong , Yoonsuk Hyun

Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.

由于深度学习的进步，物体检测在估计图像中多个物体的位置和类别方面取得了重大进展。然而，在单幅图像中检测不同尺度的物体仍然是一个具有挑战性的问题。在本研究中，我们提出了一种尺度感知标记匹配方法，用于预测基于变换器的物体检测中物体的位置和类别。与以往在训练过程中不考虑尺度而进行匹配的方法不同，我们通过将检测标记与地面实况进行匹配来训练模型。我们根据尺度将一个检测标记集分为多个标记集，并将每个标记集与地面实况进行不同的匹配，从而在不增加额外计算成本的情况下训练模型。实验结果表明，尺度信息可以分配给标记。尺度感知标记可以通过使用新颖的损失函数独立学习特定尺度信息，从而提高对小物体的检测性能。

{"title":"Scale-aware token-matching for transformer-based object detector","authors":"Aecheon Jung , Sungeun Hong , Yoonsuk Hyun","doi":"10.1016/j.patrec.2024.08.006","DOIUrl":"10.1016/j.patrec.2024.08.006","url":null,"abstract":"<div><p>Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 197-202"},"PeriodicalIF":3.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002381/pdfft?md5=455cf43c88bbb69d1fdd489f7d4c3fe2&pid=1-s2.0-S0167865524002381-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coding self-representative and label-relaxed hashing for cross-modal retrieval 用于跨模态检索的自代表和标签宽松散列编码

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-23 DOI: 10.1016/j.patrec.2024.08.011

Lin Jiang , Jigang Wu , Shuping Zhao , Jiaxing Li

In cross-modal retrieval, most existing hashing-based methods merely considered the relationship between feature representations to reduce the heterogeneous gap for data from various modalities, whereas they neglected the correlation between feature representations and the corresponding labels. This leads to the loss of significant semantic information, and the degradation of the class discriminability of the model. To tackle these issues, this paper presents a novel cross-modal retrieval method called coding self-representative and label-relaxed hashing (CSLRH) for cross-modal retrieval. Specifically, we propose a self-representation learning term to enhance the class-specific feature representations and reduce the noise interference. Additionally, we introduce a label-relaxed regression to establish semantic relations between the hash codes and the label information, aiming to enhance the semantic discriminability. Moreover, we incorporate a non-linear regression to capture the correlation of non-linear features in hash codes for cross-modal retrieval. Experimental results on three widely-used datasets verify the effectiveness of our proposed method, which can generate more discriminative hash codes to improve the precisions of cross-modal retrieval.

在跨模态检索中，大多数现有的基于散列的方法仅仅考虑了特征表征之间的关系，以减少来自不同模态数据的异质性差距，而忽略了特征表征与相应标签之间的相关性。这就导致了重要语义信息的丢失，以及模型类别区分度的降低。为了解决这些问题，本文提出了一种新型的跨模态检索方法，即用于跨模态检索的编码自表示和标签松散散列（CSLRH）。具体来说，我们提出了一种自代表学习项，以增强特定类别的特征表示并减少噪声干扰。此外，我们还引入了标签松弛回归，以建立哈希代码与标签信息之间的语义关系，从而提高语义可辨别性。此外，我们还加入了非线性回归，以捕捉哈希代码中非线性特征的相关性，从而实现跨模态检索。在三个广泛使用的数据集上的实验结果验证了我们提出的方法的有效性，该方法可以生成更具区分度的哈希代码，从而提高跨模态检索的精确度。

{"title":"Coding self-representative and label-relaxed hashing for cross-modal retrieval","authors":"Lin Jiang , Jigang Wu , Shuping Zhao , Jiaxing Li","doi":"10.1016/j.patrec.2024.08.011","DOIUrl":"10.1016/j.patrec.2024.08.011","url":null,"abstract":"<div><p>In cross-modal retrieval, most existing hashing-based methods merely considered the relationship between feature representations to reduce the heterogeneous gap for data from various modalities, whereas they neglected the correlation between feature representations and the corresponding labels. This leads to the loss of significant semantic information, and the degradation of the class discriminability of the model. To tackle these issues, this paper presents a novel cross-modal retrieval method called coding self-representative and label-relaxed hashing (CSLRH) for cross-modal retrieval. Specifically, we propose a self-representation learning term to enhance the class-specific feature representations and reduce the noise interference. Additionally, we introduce a label-relaxed regression to establish semantic relations between the hash codes and the label information, aiming to enhance the semantic discriminability. Moreover, we incorporate a non-linear regression to capture the correlation of non-linear features in hash codes for cross-modal retrieval. Experimental results on three widely-used datasets verify the effectiveness of our proposed method, which can generate more discriminative hash codes to improve the precisions of cross-modal retrieval.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contraction mapping of feature norms for data quality imbalance learning 用于数据质量不平衡学习的特征规范收缩映射

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-23 DOI: 10.1016/j.patrec.2024.08.016

Weihua Liu , Xiabi Liu , Huiyu Li , Chaochao Lin

The popular softmax loss and its recent extensions have achieved great success in deep learning-based image classification. However, the data for training image classifiers often exhibit a highly skewed distribution in quality, i.e., the number of data with good quality is much more than that with low quality. If this problem is ignored, low-quality data are hard to classify correctly. In this paper, we discover the positive correlation between the quality of an image and its feature norm ( $L_{2}$ -norm) learned from softmax loss through careful experiments on various applications with different deep neural networks. Based on this finding, we propose a contraction mapping function to compress the range of feature norms of training images according to their quality and embed this contraction mapping function into softmax loss and its extensions to produce novel learning objectives. Experiments on various applications, including handwritten digit recognition, lung nodule classification, and face recognition, demonstrate that the proposed approach is promising to effectively deal with the problem of learning quality imbalance data and leads to significant and stable improvements in the classification accuracy. The code is available at https://github.com/Huiyu-Li/CM-M-Softmax-Loss.

在基于深度学习的图像分类中，流行的 softmax 损失及其最近的扩展取得了巨大成功。然而，用于训练图像分类器的数据在质量上往往呈现高度倾斜分布，即质量好的数据数量远远多于质量差的数据数量。如果忽略这个问题，低质量数据就很难被正确分类。在本文中，我们通过使用不同的深度神经网络对各种应用进行仔细实验，发现了图像质量与通过 softmax loss 学习到的特征规范（L2-norm）之间的正相关性。基于这一发现，我们提出了一种收缩映射函数，用于根据图像质量压缩训练图像的特征规范范围，并将这种收缩映射函数嵌入到 softmax loss 及其扩展中，以产生新的学习目标。在手写数字识别、肺结节分类和人脸识别等各种应用上的实验表明，所提出的方法有望有效地解决学习质量不平衡数据的问题，并能显著而稳定地提高分类准确率。代码见 https://github.com/Huiyu-Li/CM-M-Softmax-Loss。

{"title":"Contraction mapping of feature norms for data quality imbalance learning","authors":"Weihua Liu , Xiabi Liu , Huiyu Li , Chaochao Lin","doi":"10.1016/j.patrec.2024.08.016","DOIUrl":"10.1016/j.patrec.2024.08.016","url":null,"abstract":"<div><p>The popular softmax loss and its recent extensions have achieved great success in deep learning-based image classification. However, the data for training image classifiers often exhibit a highly skewed distribution in quality, i.e., the number of data with good quality is much more than that with low quality. If this problem is ignored, low-quality data are hard to classify correctly. In this paper, we discover the positive correlation between the quality of an image and its feature norm (<span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm) learned from softmax loss through careful experiments on various applications with different deep neural networks. Based on this finding, we propose a contraction mapping function to compress the range of feature norms of training images according to their quality and embed this contraction mapping function into softmax loss and its extensions to produce novel learning objectives. Experiments on various applications, including handwritten digit recognition, lung nodule classification, and face recognition, demonstrate that the proposed approach is promising to effectively deal with the problem of learning quality imbalance data and leads to significant and stable improvements in the classification accuracy. The code is available at <span><span>https://github.com/Huiyu-Li/CM-M-Softmax-Loss</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 232-238"},"PeriodicalIF":3.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Meta-learning from learning curves for budget-limited algorithm selection 根据学习曲线进行元学习，以选择预算有限的算法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.010

Manh Hung Nguyen , Lisheng Sun Hosoya , Isabelle Guyon

Training a large set of machine learning algorithms to convergence in order to select the best-performing algorithm for a dataset is computationally wasteful. Moreover, in a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it, ensuring that the limited budget is optimally distributed to favor the most promising candidates. Casting this problem as a Markov Decision Process, we propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained. At each time step, given an observation of partial learning curves of algorithms, the agent must decide whether to allocate resources to further train the most promising algorithm (exploitation), to wake up another algorithm previously put to sleep, or to start training a new algorithm (exploration). In addition, our framework allows the agent to meta-learn from learning curves on past datasets along with dataset meta-features and algorithm hyperparameters. By incorporating meta-learning, we aim to avoid myopic decisions based solely on premature learning curves on the dataset at hand. We introduce two benchmarks of learning curves that served in international competitions at WCCI’22 and AutoML-conf’22, of which we analyze the results. Our findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and our DDQN baseline, compared to heuristic baselines or a random search. Interestingly, our cost-effective baseline, which selects the best-performing algorithm w.r.t. a small budget, can perform decently when learning curves do not intersect frequently.

为了为一个数据集选择性能最佳的算法而训练一大套机器学习算法直到收敛，这在计算上是一种浪费。此外，在预算有限的情况下，仔细选择候选算法并为其训练分配预算至关重要，这样才能确保有限的预算得到最佳分配，从而有利于最有前途的候选算法。我们将这一问题视为马尔可夫决策过程，提出了一个新颖的框架，在该框架中，代理必须在学习过程中选择最有前途的算法，而无需等到算法完全训练完成。在每个时间步骤中，给定对算法部分学习曲线的观察结果，代理必须决定是分配资源进一步训练最有前途的算法（开发），还是唤醒之前休眠的另一种算法，或者开始训练一种新算法（探索）。此外，我们的框架允许代理从过去数据集的学习曲线以及数据集元特征和算法超参数中进行元学习。通过元学习，我们旨在避免仅根据手头数据集的过早学习曲线做出近视决策。我们介绍了在 WCCI'22 和 AutoML-conf'22 国际竞赛中使用的两个学习曲线基准，并对其结果进行了分析。我们的研究结果表明，与启发式基线或随机搜索相比，元学习和学习曲线的进步都能增强算法选择过程，这一点可以从获胜团队的方法和我们的 DDQN 基线中得到证明。有趣的是，当学习曲线不经常相交时，我们的成本效益基线（在预算较少的情况下选择表现最佳的算法）也能表现出色。

{"title":"Meta-learning from learning curves for budget-limited algorithm selection","authors":"Manh Hung Nguyen , Lisheng Sun Hosoya , Isabelle Guyon","doi":"10.1016/j.patrec.2024.08.010","DOIUrl":"10.1016/j.patrec.2024.08.010","url":null,"abstract":"<div><p>Training a large set of machine learning algorithms to convergence in order to select the best-performing algorithm for a dataset is computationally wasteful. Moreover, in a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it, ensuring that the limited budget is optimally distributed to favor the most promising candidates. Casting this problem as a Markov Decision Process, we propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained. At each time step, given an observation of partial learning curves of algorithms, the agent must decide whether to allocate resources to further train the most promising algorithm (exploitation), to wake up another algorithm previously put to sleep, or to start training a new algorithm (exploration). In addition, our framework allows the agent to meta-learn from learning curves on past datasets along with dataset meta-features and algorithm hyperparameters. By incorporating meta-learning, we aim to avoid myopic decisions based solely on premature learning curves on the dataset at hand. We introduce two benchmarks of learning curves that served in international competitions at WCCI’22 and AutoML-conf’22, of which we analyze the results. Our findings show that both meta-learning and the progression of learning curves enhance the algorithm selection process, as evidenced by methods of winning teams and our DDQN baseline, compared to heuristic baselines or a random search. Interestingly, our cost-effective baseline, which selects the best-performing algorithm w.r.t. a small budget, can perform decently when learning curves do not intersect frequently.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 225-231"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep NRSFM for multi-view multi-body pose estimation 用于多视角多体姿态估计的深度 NRSFM

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.015

Áron Fóthi, Joul Skaf, Fengjiao Lu, Kristian Fenech

This paper addresses the challenging task of unsupervised relative human pose estimation. Our solution exploits the potential offered by utilizing multiple uncalibrated cameras. It is assumed that spatial human pose and camera parameter estimation can be solved as a block sparse dictionary learning problem with zero supervision. The resulting structures and camera parameters can fit individual skeletons into a common space. To do so, we exploit the fact that all individuals in the image are viewed from the same camera viewpoint, thus exploiting the information provided by multiple camera views and overcoming the lack of information on camera parameters. To the best of our knowledge, this is the first solution that requires neither 3D ground truth nor knowledge of the intrinsic or extrinsic camera parameters. Our approach demonstrates the potential of using multiple viewpoints to solve challenging computer vision problems. Additionally, we provide access to the code, encouraging further development and experimentation. https://github.com/Jeryoss/MVMB-NRSFM.

本文探讨了无监督相对人体姿态估计这一具有挑战性的任务。我们的解决方案利用了多台未校准摄像机的潜力。假设空间人体姿态和摄像机参数估计可以作为零监督的块稀疏字典学习问题来解决。由此产生的结构和摄像机参数可将单个骨架拟合到一个共同的空间中。为此，我们利用了图像中的所有个体都从同一摄像机视角观看这一事实，从而利用了多个摄像机视角提供的信息，克服了摄像机参数信息缺乏的问题。据我们所知，这是第一个既不需要三维地面实况，也不需要内在或外在相机参数知识的解决方案。我们的方法展示了使用多视角解决具有挑战性的计算机视觉问题的潜力。此外，我们还提供了代码访问权限，鼓励进一步开发和实验。https://github.com/Jeryoss/MVMB-NRSFM。

{"title":"Deep NRSFM for multi-view multi-body pose estimation","authors":"Áron Fóthi, Joul Skaf, Fengjiao Lu, Kristian Fenech","doi":"10.1016/j.patrec.2024.08.015","DOIUrl":"10.1016/j.patrec.2024.08.015","url":null,"abstract":"<div><p>This paper addresses the challenging task of unsupervised relative human pose estimation. Our solution exploits the potential offered by utilizing multiple uncalibrated cameras. It is assumed that spatial human pose and camera parameter estimation can be solved as a block sparse dictionary learning problem with zero supervision. The resulting structures and camera parameters can fit individual skeletons into a common space. To do so, we exploit the fact that all individuals in the image are viewed from the same camera viewpoint, thus exploiting the information provided by multiple camera views and overcoming the lack of information on camera parameters. To the best of our knowledge, this is the first solution that requires neither 3D ground truth nor knowledge of the intrinsic or extrinsic camera parameters. Our approach demonstrates the potential of using multiple viewpoints to solve challenging computer vision problems. Additionally, we provide access to the code, encouraging further development and experimentation. <span><span>https://github.com/Jeryoss/MVMB-NRSFM</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 218-224"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002472/pdfft?md5=c7f415f86c9c99693c29d66ef080962f&pid=1-s2.0-S0167865524002472-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Saliency-based video summarization for face anti-spoofing 基于显著性的人脸反欺骗视频总结

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.008

Usman Muhammad , Mourad Oussalah , Jorma Laaksonen

With the growing availability of databases for face presentation attack detection, researchers are increasingly focusing on video-based face anti-spoofing methods that involve hundreds to thousands of images for training the models. However, there is currently no clear consensus on the optimal number of frames in a video to improve face spoofing detection. Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency. In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images, enabling the identification of the most visually salient regions within each frame. Subsequently, the source images are decomposed into base and detail images, enhancing the representation of the most important information. Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image. By linearly combining the base and detail images using the weighting maps, the method fuses the source images to create a single representative image that summarizes the entire video. The key contribution of the proposed method lies in demonstrating how visual saliency can be used as a data-centric approach to improve the performance and efficiency for face presentation attack detection. By focusing on the most salient images or regions within the images, a more representative and diverse training set can be created, potentially leading to more effective models. To validate the method’s effectiveness, a simple CNN–RNN deep learning architecture was used, and the experimental results showcased state-of-the-art performance on four challenging face anti-spoofing datasets.

随着用于人脸呈现攻击检测的数据库越来越多，研究人员越来越关注基于视频的人脸反欺骗方法，这种方法需要数百到数千张图像来训练模型。然而，对于视频中的最佳帧数以提高人脸欺骗检测的效果，目前还没有明确的共识。受视觉显著性理论的启发，我们提出了一种用于人脸反欺骗检测的视频总结方法，旨在利用视觉显著性提高深度学习模型的性能和效率。具体而言，我们从源图像的拉普拉斯滤波和维纳滤波输出之间的差异中提取出显著性信息，从而识别出每帧图像中视觉最突出的区域。随后，源图像被分解为基本图像和细节图像，从而增强了对最重要信息的呈现。然后根据显著性信息计算加权图，显示图像中每个像素的重要性。通过使用加权图线性组合基础图像和细节图像，该方法可融合源图像，从而创建一个能概括整个视频的单一代表性图像。所提方法的主要贡献在于展示了如何将视觉显著性作为一种以数据为中心的方法来提高人脸呈现攻击检测的性能和效率。通过关注图像中最突出的图像或区域，可以创建更具代表性和多样性的训练集，从而建立更有效的模型。为了验证该方法的有效性，我们使用了一个简单的 CNN-RNN 深度学习架构，实验结果显示，在四个具有挑战性的人脸反欺骗数据集上，该方法具有最先进的性能。

{"title":"Saliency-based video summarization for face anti-spoofing","authors":"Usman Muhammad , Mourad Oussalah , Jorma Laaksonen","doi":"10.1016/j.patrec.2024.08.008","DOIUrl":"10.1016/j.patrec.2024.08.008","url":null,"abstract":"<div><p>With the growing availability of databases for face presentation attack detection, researchers are increasingly focusing on video-based face anti-spoofing methods that involve hundreds to thousands of images for training the models. However, there is currently no clear consensus on the optimal number of frames in a video to improve face spoofing detection. Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency. In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images, enabling the identification of the most visually salient regions within each frame. Subsequently, the source images are decomposed into base and detail images, enhancing the representation of the most important information. Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image. By linearly combining the base and detail images using the weighting maps, the method fuses the source images to create a single representative image that summarizes the entire video. The key contribution of the proposed method lies in demonstrating how visual saliency can be used as a data-centric approach to improve the performance and efficiency for face presentation attack detection. By focusing on the most salient images or regions within the images, a more representative and diverse training set can be created, potentially leading to more effective models. To validate the method’s effectiveness, a simple CNN–RNN deep learning architecture was used, and the experimental results showcased state-of-the-art performance on four challenging face anti-spoofing datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 190-196"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph neural collaborative filtering with medical content-aware pre-training for treatment pattern recommendation 图神经协同过滤与医疗内容感知预训练用于治疗模式推荐

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.014

Xin Min , Wei Li , Ruiqi Han , Tianlong Ji , Weidong Xie

Recently, considering the advancement of information technology in healthcare, electronic medical records (EMRs) have become the repository of patients’ treatment processes in hospitals, including the patient’s treatment pattern (standard treatment process), the patient’s medical history, the patient’s admission diagnosis, etc. In particular, EMRs-based treatment recommendation systems have become critical for optimizing clinical decision-making. EMRs contain complex relationships between patients and treatment patterns. Recent studies have shown that graph neural collaborative filtering can effectively capture the complex relationships in EMRs. However, none of the existing methods take into account the impact of medical content such as the patient’s admission diagnosis, and medical history on treatment recommendations. In this work, we propose a graph neural collaborative filtering model with medical content-aware pre-training (CAPRec) for learning initial embeddings with medical content to improve recommendation performance. First the model constructs a patient-treatment pattern interaction graph from EMRs data. Then we attempt to use the medical content for pre-training learning and transfer the learned embeddings to a graph neural collaborative filtering model. Finally, the learned initial embedding can support the downstream task of graph collaborative filtering. Extensive experiments on real world datasets have consistently demonstrated the effectiveness of the medical content-aware training framework in improving treatment recommendations.

最近，考虑到医疗保健领域信息技术的进步，电子病历（EMR）已成为医院患者治疗过程的储存库，包括患者的治疗模式（标准治疗过程）、患者的病史、患者的入院诊断等。特别是，基于 EMR 的治疗建议系统对优化临床决策至关重要。电子病历包含病人和治疗模式之间的复杂关系。最近的研究表明，图神经协同过滤可以有效捕捉 EMR 中的复杂关系。然而，现有的方法都没有考虑到医疗内容（如患者的入院诊断和病史）对治疗建议的影响。在这项工作中，我们提出了一种带有医疗内容感知预训练（CAPRec）的图神经协同过滤模型，用于学习带有医疗内容的初始嵌入，以提高推荐性能。首先，该模型从 EMRs 数据中构建患者-治疗模式交互图。然后，我们尝试使用医疗内容进行预训练学习，并将学习到的嵌入信息转移到图神经协同过滤模型中。最后，学习到的初始嵌入可以支持图协同过滤的下游任务。在现实世界数据集上进行的大量实验一致证明了医疗内容感知训练框架在改进治疗建议方面的有效性。

{"title":"Graph neural collaborative filtering with medical content-aware pre-training for treatment pattern recommendation","authors":"Xin Min , Wei Li , Ruiqi Han , Tianlong Ji , Weidong Xie","doi":"10.1016/j.patrec.2024.08.014","DOIUrl":"10.1016/j.patrec.2024.08.014","url":null,"abstract":"<div><p>Recently, considering the advancement of information technology in healthcare, electronic medical records (EMRs) have become the repository of patients’ treatment processes in hospitals, including the patient’s treatment pattern (standard treatment process), the patient’s medical history, the patient’s admission diagnosis, etc. In particular, EMRs-based treatment recommendation systems have become critical for optimizing clinical decision-making. EMRs contain complex relationships between patients and treatment patterns. Recent studies have shown that graph neural collaborative filtering can effectively capture the complex relationships in EMRs. However, none of the existing methods take into account the impact of medical content such as the patient’s admission diagnosis, and medical history on treatment recommendations. In this work, we propose a graph neural collaborative filtering model with medical content-aware pre-training (CAPRec) for learning initial embeddings with medical content to improve recommendation performance. First the model constructs a patient-treatment pattern interaction graph from EMRs data. Then we attempt to use the medical content for pre-training learning and transfer the learned embeddings to a graph neural collaborative filtering model. Finally, the learned initial embedding can support the downstream task of graph collaborative filtering. Extensive experiments on real world datasets have consistently demonstrated the effectiveness of the medical content-aware training framework in improving treatment recommendations.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 210-217"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Swin-chart: An efficient approach for chart classification 斯温图表图表分类的有效方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.012

Anurag Dhote , Mohammed Javed , David S. Doermann

Charts are a visualization tool used in scientific documents to facilitate easy comprehension of complex relationships underlying data and experiments. Researchers use various chart types to convey scientific information, so the problem of data extraction and subsequent chart understanding becomes very challenging. Many studies have been taken up in the literature to address the problem of chart mining, whose motivation is to facilitate the editing of existing charts, carry out extrapolative studies, and provide a deeper understanding of the underlying data. The first step towards chart understanding is chart classification, for which traditional ML and CNN-based deep learning models have been used in the literature. In this paper, we propose Swin-Chart, a Swin transformer-based deep learning approach for chart classification, which generalizes well across multiple datasets with a wide range of chart categories. Swin-Chart comprises a pre-trained Swin Transformer, a finetuning component, and a weight averaging component. The proposed approach is tested on a five-chart image benchmark dataset. We observed that the Swin-Chart model outperformers existing state-of-the-art models on all the datasets. Furthermore, we also provide an ablation study of the Swin-Chart model with all five datasets to understand the importance of various sub-parts such as the back-bone Swin transformer model, the value of several best weights selected for the weight averaging component, and the presence of the weight averaging component itself.

The Swin-Chart model also received first position in the chart classification task on the latest dataset in the CHART Infographics competition at ICDAR 2023 - chartinfo.github.io.

图表是科学文献中的一种可视化工具，便于理解数据和实验背后的复杂关系。研究人员使用各种类型的图表来传递科学信息，因此数据提取和后续图表理解问题变得非常具有挑战性。针对图表挖掘问题，许多文献都进行了研究，其动机是促进现有图表的编辑，开展推断研究，并提供对基础数据的更深入理解。图表理解的第一步是图表分类，文献中使用了传统的 ML 和基于 CNN 的深度学习模型。在本文中，我们提出了一种基于 Swin 变换器的图表分类深度学习方法--Swin-Chart，它能在具有广泛图表类别的多个数据集上实现良好的泛化。Swin-Chart 由一个预训练的 Swin 变换器、一个微调组件和一个权重平均组件组成。我们在五个图表图像基准数据集上对所提出的方法进行了测试。我们发现，在所有数据集上，Swin-Chart 模型都优于现有的最先进模型。此外，我们还利用所有五个数据集对 Swin-Chart 模型进行了消融研究，以了解各个子部分的重要性，如骨干 Swin 变换器模型、为权重平均组件选择的几个最佳权重的值以及权重平均组件本身的存在。

{"title":"Swin-chart: An efficient approach for chart classification","authors":"Anurag Dhote , Mohammed Javed , David S. Doermann","doi":"10.1016/j.patrec.2024.08.012","DOIUrl":"10.1016/j.patrec.2024.08.012","url":null,"abstract":"<div><p>Charts are a visualization tool used in scientific documents to facilitate easy comprehension of complex relationships underlying data and experiments. Researchers use various chart types to convey scientific information, so the problem of data extraction and subsequent chart understanding becomes very challenging. Many studies have been taken up in the literature to address the problem of chart mining, whose motivation is to facilitate the editing of existing charts, carry out extrapolative studies, and provide a deeper understanding of the underlying data. The first step towards chart understanding is chart classification, for which traditional ML and CNN-based deep learning models have been used in the literature. In this paper, we propose Swin-Chart, a Swin transformer-based deep learning approach for chart classification, which generalizes well across multiple datasets with a wide range of chart categories. Swin-Chart comprises a pre-trained Swin Transformer, a finetuning component, and a weight averaging component. The proposed approach is tested on a five-chart image benchmark dataset. We observed that the Swin-Chart model outperformers existing state-of-the-art models on all the datasets. Furthermore, we also provide an ablation study of the Swin-Chart model with all five datasets to understand the importance of various sub-parts such as the back-bone Swin transformer model, the value of several best weights selected for the weight averaging component, and the presence of the weight averaging component itself.</p><p>The Swin-Chart model also received first position in the chart classification task on the latest dataset in the CHART Infographics competition at ICDAR 2023 - chartinfo.github.io.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 203-209"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contrastive Learning for Lane Detection via cross-similarity 通过交叉相似性对车道检测进行对比学习

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-08-20 DOI: 10.1016/j.patrec.2024.08.007

Ali Zoljodi , Sadegh Abadijou , Mina Alibeigi , Masoud Daneshtalab

Detecting lane markings in road scenes poses a significant challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by varying lighting conditions, adverse weather, occlusions by other vehicles or pedestrians, road plane changes, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of high-quality and diverse data to train a robust lane detection model capable of handling various real-world scenarios.

In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via Cross-Similarity (CLLD) to enhance the resilience and effectiveness of lane detection models in real-world scenarios, particularly when the visibility of lane markings are compromised. CLLD introduces a novel contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our newly proposed operation, dubbed cross-similarity.

The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmark datasets demonstrate that CLLD consistently outperforms state-of-the-art contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. When compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.

道路场景中的车道标线错综复杂，很容易受到不利条件的影响，因此对其进行检测是一项巨大的挑战。虽然车道标线具有很强的形状先验性，但其可视性很容易受到不同光照条件、恶劣天气、其他车辆或行人遮挡、路面变化以及颜色随时间褪色等因素的影响。检测过程因多种车道形状和自然变化的存在而变得更加复杂，因此需要大量高质量和多样化的数据来训练能够处理各种真实世界场景的鲁棒车道检测模型。在本文中，我们提出了一种名为 "通过交叉相似性进行车道检测的对比学习"（Contrastive Learning for Lane Detection via Cross-Similarity，简称 CLLD）的新型自监督学习方法，以增强车道检测模型在真实世界场景中的适应性和有效性，尤其是当车道标记的可见性受到影响时。CLLD 引入了一种新颖的对比学习（CL）方法，在输入图像的全局背景下评估局部特征的相似性。它利用周边信息来预测车道标记。这是通过将局部特征对比学习与我们新提出的操作（称为交叉相似性）相结合来实现的。局部特征 CL 专注于从小块图像中提取特征，这是精确定位车道分段的必要条件。同时，交叉相似性可以捕捉全局特征，从而根据周围环境检测出模糊的车道段。我们通过在增强过程中随机屏蔽部分输入图像来增强交叉相似性。在 TuSimple 和 CuLane 基准数据集上进行的大量实验表明，CLLD 始终优于最先进的对比学习方法，尤其是在阴影等有损可见度的条件下，同时它在正常条件下也能提供与之相当的结果。与监督学习相比，CLLD 在阴影和拥挤场景等具有挑战性的场景中仍然表现出色，而这些场景在实际驾驶中很常见。

{"title":"Contrastive Learning for Lane Detection via cross-similarity","authors":"Ali Zoljodi , Sadegh Abadijou , Mina Alibeigi , Masoud Daneshtalab","doi":"10.1016/j.patrec.2024.08.007","DOIUrl":"10.1016/j.patrec.2024.08.007","url":null,"abstract":"<div><p>Detecting lane markings in road scenes poses a significant challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by varying lighting conditions, adverse weather, occlusions by other vehicles or pedestrians, road plane changes, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of high-quality and diverse data to train a robust lane detection model capable of handling various real-world scenarios.</p><p>In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via Cross-Similarity (CLLD) to enhance the resilience and effectiveness of lane detection models in real-world scenarios, particularly when the visibility of lane markings are compromised. CLLD introduces a novel contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our newly proposed operation, dubbed <em>cross-similarity</em>.</p><p>The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmark datasets demonstrate that CLLD consistently outperforms state-of-the-art contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. When compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 175-183"},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002393/pdfft?md5=216ead31bb4d56cfb720a21ce2d4db87&pid=1-s2.0-S0167865524002393-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0