首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Use estimated signal and noise to adjust step size for image restoration 利用估计的信号和噪声调整图像修复的步长
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-11 DOI: 10.1016/j.patrec.2024.09.006
Min Zhang , Shupeng Liu , Taihao Li , Huai Chen , Xiaoyin Xu

Image deblurring is a challenging inverse problem, especially when there is additive noise to the observation. To solve such an inverse problem in an iterative manner, it is important to control the step size for achieving a stable and robust performance. We designed a method that controls the progress of iterative process in solving the inverse problem without the need for a user-specified step size. The method searches for an optimal step size under the assumption that the signal and noise are two independent stochastic processes. Experiments show that the method can achieve good performance in the presence of noise and imperfect knowledge about the blurring kernel. Tests also show that, for different blurring kernels and noise levels, the difference between two consecutive estimates given by the new method tends to remain more stable and stay in a smaller range, as compared to those given by some existing techniques. This stable feature makes the new method more robust in the sense that it is easier to select a stopping threshold for the new method to use in different scenarios.

图像去模糊是一个极具挑战性的逆问题,尤其是当观测数据中存在加性噪声时。要以迭代的方式解决这样的逆问题,必须控制步长,以获得稳定和鲁棒的性能。我们设计了一种方法,无需用户指定步长,即可控制逆问题迭代求解过程的进度。该方法在信号和噪声是两个独立随机过程的假设下,寻找最佳步长。实验表明,该方法在存在噪声和模糊核知识不完善的情况下也能获得良好的性能。测试还表明,对于不同的模糊核和噪声水平,与一些现有技术相比,新方法给出的两个连续估计值之间的差值趋于稳定,并保持在一个较小的范围内。这种稳定的特点使得新方法更加稳健,因为在不同的情况下更容易为新方法选择一个停止阈值。
{"title":"Use estimated signal and noise to adjust step size for image restoration","authors":"Min Zhang ,&nbsp;Shupeng Liu ,&nbsp;Taihao Li ,&nbsp;Huai Chen ,&nbsp;Xiaoyin Xu","doi":"10.1016/j.patrec.2024.09.006","DOIUrl":"10.1016/j.patrec.2024.09.006","url":null,"abstract":"<div><p>Image deblurring is a challenging inverse problem, especially when there is additive noise to the observation. To solve such an inverse problem in an iterative manner, it is important to control the step size for achieving a stable and robust performance. We designed a method that controls the progress of iterative process in solving the inverse problem without the need for a user-specified step size. The method searches for an optimal step size under the assumption that the signal and noise are two independent stochastic processes. Experiments show that the method can achieve good performance in the presence of noise and imperfect knowledge about the blurring kernel. Tests also show that, for different blurring kernels and noise levels, the difference between two consecutive estimates given by the new method tends to remain more stable and stay in a smaller range, as compared to those given by some existing techniques. This stable feature makes the new method more robust in the sense that it is easier to select a stopping threshold for the new method to use in different scenarios.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 57-63"},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142171797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One-index vector quantization based adversarial attack on image classification 基于单索引向量量化的图像分类对抗攻击
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-06 DOI: 10.1016/j.patrec.2024.09.001
Haiju Fan , Xiaona Qin , Shuang Chen , Hubert P. H. Shum , Ming Li

To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in real-world scenarios. In this paper, we propose a novel one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm, successfully resulting in image misclassification in victim models. The one-index attack method modifies a single index in the compressed data stream so that the decompressed image is misclassified. It only needs to modify a single VQ index to realize an attack, which limits the number of perturbed indexes. The proposed method belongs to a semi-black-box attack, which is more in line with the actual attack scenario. We apply our method to attack three popular image classification models, i.e., Resnet, NIN, and VGG16. On average, 55.9 % and 77.4 % of the images in CIFAR-10 and Fashion MNIST, respectively, are successfully attacked, with a high level of misclassification confidence and a low level of image perturbation.

为了改进存储和传输,一般都会对图像进行压缩。矢量量化(VQ)是一种流行的压缩方法,因为它具有很高的压缩比,可以抑制其他压缩技术。尽管如此,现有的图像分类对抗攻击方法大多是在像素域中进行的,在压缩域中鲜有例外,因此在实际应用中的适用性较差。在本文中,我们提出了一种新颖的 VQ 域单索引攻击方法,通过差分进化算法生成对抗图像,成功导致受害者模型中的图像分类错误。单索引攻击方法修改压缩数据流中的单个索引,从而使解压缩后的图像被错误分类。它只需修改一个 VQ 索引即可实现攻击,从而限制了扰动索引的数量。所提出的方法属于半黑盒攻击,更符合实际攻击场景。我们应用我们的方法攻击了三种流行的图像分类模型,即 Resnet、NIN 和 VGG16。平均而言,CIFAR-10 和时尚 MNIST 中分别有 55.9% 和 77.4% 的图像被成功攻击,误分类置信度较高,图像扰动程度较低。
{"title":"One-index vector quantization based adversarial attack on image classification","authors":"Haiju Fan ,&nbsp;Xiaona Qin ,&nbsp;Shuang Chen ,&nbsp;Hubert P. H. Shum ,&nbsp;Ming Li","doi":"10.1016/j.patrec.2024.09.001","DOIUrl":"10.1016/j.patrec.2024.09.001","url":null,"abstract":"<div><p>To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in real-world scenarios. In this paper, we propose a novel one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm, successfully resulting in image misclassification in victim models. The one-index attack method modifies a single index in the compressed data stream so that the decompressed image is misclassified. It only needs to modify a single VQ index to realize an attack, which limits the number of perturbed indexes. The proposed method belongs to a semi-black-box attack, which is more in line with the actual attack scenario. We apply our method to attack three popular image classification models, i.e., Resnet, NIN, and VGG16. On average, 55.9 % and 77.4 % of the images in CIFAR-10 and Fashion MNIST, respectively, are successfully attacked, with a high level of misclassification confidence and a low level of image perturbation.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 47-56"},"PeriodicalIF":3.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002575/pdfft?md5=96833f101476805d73c37d5dd7083f2c&pid=1-s2.0-S0167865524002575-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142171796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAST: Clustering self-Attention using Surrogate Tokens for efficient transformers CAST:使用代用标记进行自关注聚类,实现高效变压器
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-06 DOI: 10.1016/j.patrec.2024.08.024
Adjorn van Engelenhoven, Nicola Strisciuglio, Estefanía Talavera

The Transformer architecture has shown to be a powerful tool for a wide range of tasks. It is based on the self-attention mechanism, which is an inherently computationally expensive operation with quadratic computational complexity: memory usage and compute time increase quadratically with the length of the input sequences, thus limiting the application of Transformers. In this work, we propose a novel Clustering self-Attention mechanism using Surrogate Tokens (CAST), to optimize the attention computation and achieve efficient transformers. CAST utilizes learnable surrogate tokens to construct a cluster affinity matrix, used to cluster the input sequence and generate novel cluster summaries. The self-attention from within each cluster is then combined with the cluster summaries of other clusters, enabling information flow across the entire input sequence. CAST improves efficiency by reducing the complexity from O(N2) to O(αN) where N is the sequence length, and α is constant according to the number of clusters and samples per cluster. We show that CAST performs better than or comparable to the baseline Transformers on long-range sequence modeling tasks, while also achieving higher results on time and memory efficiency than other efficient transformers.

Transformer 架构已证明是执行各种任务的强大工具。它基于自我注意机制,而自我注意机制本身是一种计算昂贵的操作,其计算复杂度为二次方:内存使用量和计算时间随输入序列的长度呈二次方增长,从而限制了变形金刚的应用。在这项工作中,我们提出了一种使用代理标记的新型聚类自我注意力机制(CAST),以优化注意力计算并实现高效的变换器。CAST 利用可学习的代理标记来构建聚类亲和矩阵,用于对输入序列进行聚类并生成新的聚类摘要。然后,每个集群内的自我注意力与其他集群的集群摘要相结合,从而实现整个输入序列的信息流。CAST 提高了效率,将复杂度从 O(N2) 降低到 O(αN),其中 N 是序列长度,α 是根据簇数和每个簇的样本数确定的常数。我们的研究表明,CAST 在长程序列建模任务中的表现优于或媲美基线变换器,同时在时间和内存效率上也高于其他高效变换器。
{"title":"CAST: Clustering self-Attention using Surrogate Tokens for efficient transformers","authors":"Adjorn van Engelenhoven,&nbsp;Nicola Strisciuglio,&nbsp;Estefanía Talavera","doi":"10.1016/j.patrec.2024.08.024","DOIUrl":"10.1016/j.patrec.2024.08.024","url":null,"abstract":"<div><p>The Transformer architecture has shown to be a powerful tool for a wide range of tasks. It is based on the self-attention mechanism, which is an inherently computationally expensive operation with quadratic computational complexity: memory usage and compute time increase quadratically with the length of the input sequences, thus limiting the application of Transformers. In this work, we propose a novel Clustering self-Attention mechanism using Surrogate Tokens (CAST), to optimize the attention computation and achieve efficient transformers. CAST utilizes learnable surrogate tokens to construct a cluster affinity matrix, used to cluster the input sequence and generate novel cluster summaries. The self-attention from within each cluster is then combined with the cluster summaries of other clusters, enabling information flow across the entire input sequence. CAST improves efficiency by reducing the complexity from <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>N</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> to <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>α</mi><mi>N</mi><mo>)</mo></mrow></mrow></math></span> where <span><math><mi>N</mi></math></span> is the sequence length, and <span><math><mi>α</mi></math></span> is constant according to the number of clusters and samples per cluster. We show that CAST performs better than or comparable to the baseline Transformers on long-range sequence modeling tasks, while also achieving higher results on time and memory efficiency than other efficient transformers.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 30-36"},"PeriodicalIF":3.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002563/pdfft?md5=41d75a76c8436c27473bdc1f0c0144be&pid=1-s2.0-S0167865524002563-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial for pattern recognition letters special issue on Advances in Disinformation Detection and Media Forensics 为《模式识别字母》"虚假信息检测和媒体取证的进展 "特刊撰写编辑文章
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-05 DOI: 10.1016/j.patrec.2024.09.004
Irene Amerini, Victor Sanchez, Luca Maiano
{"title":"Editorial for pattern recognition letters special issue on Advances in Disinformation Detection and Media Forensics","authors":"Irene Amerini,&nbsp;Victor Sanchez,&nbsp;Luca Maiano","doi":"10.1016/j.patrec.2024.09.004","DOIUrl":"10.1016/j.patrec.2024.09.004","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 21-22"},"PeriodicalIF":3.9,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SiamMAF: A multipath and feature-enhanced thermal infrared tracker SiamMAF:多路径和特征增强型热红外跟踪器
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-03 DOI: 10.1016/j.patrec.2024.09.003
Weisheng Li, Yuhao Fang, Lanbing Lv, Shunping Chen

Thermal infrared (TIR) images are visually blurred and low in information content. Some TIR trackers focus on enhancing the semantic information of TIR features, neglecting the equally important detailed information for TIR tracking. After target localization, detailed information can assist the tracker in generating accurate prediction boxes. In addition, simple element-wise addition is not a way to fully utilize and fuse multiple response maps. To address these issues, this study proposes a multipath and feature-enhanced Siamese tracker (SiamMAF) for TIR tracking. We design a feature-enhanced module (FEM) based on complementarity, which can highlight the key semantic information of the target and preserve the detailed information of objects. Furthermore, we introduce a response fusion module (RFM) that can adaptively fuse multiple response maps. Extensive experimental results on two challenging benchmarks show that SiamMAF outperforms many existing state-of-the-art TIR trackers and runs at a steady 31FPS.

热红外(TIR)图像视觉模糊,信息含量低。一些热红外跟踪器只注重增强热红外特征的语义信息,而忽略了对热红外跟踪同样重要的详细信息。目标定位后,详细信息可帮助跟踪器生成准确的预测框。此外,简单的元素相加并不能充分利用和融合多个响应图。为了解决这些问题,本研究提出了一种用于 TIR 跟踪的多路径和特征增强型连体跟踪器(SiamMAF)。我们设计了一个基于互补性的特征增强模块(FEM),它能突出目标的关键语义信息,并保留物体的详细信息。此外,我们还引入了响应融合模块(RFM),它可以自适应地融合多个响应图。在两个具有挑战性的基准上进行的大量实验结果表明,SiamMAF 的性能优于许多现有的一流 TIR 跟踪器,并且能以 31FPS 的速度稳定运行。
{"title":"SiamMAF: A multipath and feature-enhanced thermal infrared tracker","authors":"Weisheng Li,&nbsp;Yuhao Fang,&nbsp;Lanbing Lv,&nbsp;Shunping Chen","doi":"10.1016/j.patrec.2024.09.003","DOIUrl":"10.1016/j.patrec.2024.09.003","url":null,"abstract":"<div><p>Thermal infrared (TIR) images are visually blurred and low in information content. Some TIR trackers focus on enhancing the semantic information of TIR features, neglecting the equally important detailed information for TIR tracking. After target localization, detailed information can assist the tracker in generating accurate prediction boxes. In addition, simple element-wise addition is not a way to fully utilize and fuse multiple response maps. To address these issues, this study proposes a multipath and feature-enhanced Siamese tracker (SiamMAF) for TIR tracking. We design a feature-enhanced module (FEM) based on complementarity, which can highlight the key semantic information of the target and preserve the detailed information of objects. Furthermore, we introduce a response fusion module (RFM) that can adaptively fuse multiple response maps. Extensive experimental results on two challenging benchmarks show that SiamMAF outperforms many existing state-of-the-art TIR trackers and runs at a steady 31FPS.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 37-46"},"PeriodicalIF":3.9,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual speech recognition using compact hypercomplex neural networks 使用紧凑超复杂神经网络进行视觉语音识别
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-03 DOI: 10.1016/j.patrec.2024.09.002
Iason Ioannis Panagos , Giorgos Sfikas , Christophoros Nikou

Recent progress in visual speech recognition systems due to advances in deep learning and large-scale public datasets has led to impressive performance compared to human professionals. The potential applications of these systems in real-life scenarios are numerous and can greatly benefit the lives of many individuals. However, most of these systems are not designed with practicality in mind, requiring large-size models and powerful hardware, factors which limit their applicability in resource-constrained environments and other real-world tasks. In addition, few works focus on developing lightweight systems that can be deployed in such conditions. Considering these issues, we propose compact networks that take advantage of hypercomplex layers that utilize a sum of Kronecker products to reduce overall parameter demands and model sizes. We train and evaluate our proposed models on the largest public dataset for single word speech recognition for English. Our experiments show that high compression rates are achievable with a minimal accuracy drop, indicating the method’s potential for practical applications in lower-resource environments. Code and models are available at https://github.com/jpanagos/vsr_phm.

由于深度学习和大规模公共数据集的进步,视觉语音识别系统取得了最新进展,与人类专业人员相比,其性能令人印象深刻。这些系统在现实生活中的潜在应用不胜枚举,可以极大地改善许多人的生活。然而,这些系统在设计时大多没有考虑到实用性,需要大型模型和功能强大的硬件,这些因素限制了它们在资源有限的环境和其他实际任务中的适用性。此外,很少有研究致力于开发可在这种条件下部署的轻量级系统。考虑到这些问题,我们提出了紧凑型网络,利用超复杂层的优势,利用克罗内克乘积之和来减少整体参数需求和模型大小。我们在最大的英语单词语音识别公共数据集上训练和评估了我们提出的模型。我们的实验表明,在准确率下降很小的情况下,可以实现很高的压缩率,这表明该方法在资源较少的环境中具有实际应用的潜力。代码和模型可在 https://github.com/jpanagos/vsr_phm 上获取。
{"title":"Visual speech recognition using compact hypercomplex neural networks","authors":"Iason Ioannis Panagos ,&nbsp;Giorgos Sfikas ,&nbsp;Christophoros Nikou","doi":"10.1016/j.patrec.2024.09.002","DOIUrl":"10.1016/j.patrec.2024.09.002","url":null,"abstract":"<div><p>Recent progress in visual speech recognition systems due to advances in deep learning and large-scale public datasets has led to impressive performance compared to human professionals. The potential applications of these systems in real-life scenarios are numerous and can greatly benefit the lives of many individuals. However, most of these systems are not designed with practicality in mind, requiring large-size models and powerful hardware, factors which limit their applicability in resource-constrained environments and other real-world tasks. In addition, few works focus on developing lightweight systems that can be deployed in such conditions. Considering these issues, we propose compact networks that take advantage of hypercomplex layers that utilize a sum of Kronecker products to reduce overall parameter demands and model sizes. We train and evaluate our proposed models on the largest public dataset for single word speech recognition for English. Our experiments show that high compression rates are achievable with a minimal accuracy drop, indicating the method’s potential for practical applications in lower-resource environments. Code and models are available at <span><span>https://github.com/jpanagos/vsr_phm</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A method for evaluating deep generative models of images for hallucinations in high-order spatial context 评估高阶空间背景下幻觉图像深度生成模型的方法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-02 DOI: 10.1016/j.patrec.2024.08.023
Rucha Deshpande , Mark A. Anastasio , Frank J. Brooks

Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying any sort of DGM in mission-critical applications is a lack of adequate and/or automatic means of assessing the domain-specific quality of generated images. In this work, we demonstrate several objective and human-interpretable tests of images output by two popular DGMs. These tests serve two goals: (i) ruling out DGMs for downstream, domain-specific applications, and (ii) quantifying hallucinations in the expected spatial context in DGM-generated images. The designed datasets are made public and the proposed tests could also serve as benchmarks and aid the prototyping of emerging DGMs. Although these tests are demonstrated on GANs, they can be employed as a benchmark for evaluating any DGM. Specifically, we designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained DGM. Together, these SCMs encode features as per-image constraints in prevalence, position, intensity, and/or texture. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different DGMs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. The main conclusion is that SCMs can be engineered, and serve as benchmarks, to quantify numerous per image errors, i.e., hallucinations, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the DGM-generated images.

深度生成模型(DGM)有可能彻底改变成像诊断。生成式对抗网络(GAN)是一种被广泛应用的 DGM。在关键任务应用中部署任何类型的 DGM 的首要问题是缺乏适当和/或自动的方法来评估生成图像的特定领域质量。在这项工作中,我们展示了对两种流行的 DGM 所输出图像进行的几种客观且可人为解读的测试。这些测试有两个目的(i) 排除适用于下游特定领域应用的 DGM,(ii) 量化 DGM 生成的图像在预期空间环境中出现的幻觉。所设计的数据集是公开的,所建议的测试也可以作为基准,并有助于新兴 DGM 的原型开发。虽然这些测试是在 GANs 上进行的,但它们可以用作评估任何 DGM 的基准。具体来说,我们设计了几种不同图像特征的随机上下文模型(SCM),可以在训练有素的 DGM 生成后进行恢复。这些随机上下文模型共同将特征编码为每幅图像在流行度、位置、强度和/或纹理方面的约束条件。其中一些特征是高阶算法像素排列规则,不容易用协方差矩阵表示。我们设计并验证了统计分类器,以检测已知排列规则的特定效果。然后,我们测试了两种不同的 DGM 在各种训练场景和特征类相似程度下正确再现特征上下文的比率。我们发现,生成的图像集合可以在视觉上显示出很大程度的准确性,并且在集合测量中显示出很高的准确性,但却没有显示出已知的空间排列。我们的主要结论是,可以设计单片机并将其作为基准,以量化可能无法在集合统计中捕捉到、但可能会影响后续使用 DGM 生成的图像的众多单个图像错误(即幻觉)。
{"title":"A method for evaluating deep generative models of images for hallucinations in high-order spatial context","authors":"Rucha Deshpande ,&nbsp;Mark A. Anastasio ,&nbsp;Frank J. Brooks","doi":"10.1016/j.patrec.2024.08.023","DOIUrl":"10.1016/j.patrec.2024.08.023","url":null,"abstract":"<div><p>Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying any sort of DGM in mission-critical applications is a lack of adequate and/or automatic means of assessing the domain-specific quality of generated images. In this work, we demonstrate several objective and human-interpretable tests of images output by two popular DGMs. These tests serve two goals: (i) ruling out DGMs for downstream, domain-specific applications, and (ii) quantifying hallucinations in the expected spatial context in DGM-generated images. The designed datasets are made public and the proposed tests could also serve as benchmarks and aid the prototyping of emerging DGMs. Although these tests are demonstrated on GANs, they can be employed as a benchmark for evaluating any DGM. Specifically, we designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained DGM. Together, these SCMs encode features as per-image constraints in prevalence, position, intensity, and/or texture. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different DGMs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. The main conclusion is that SCMs can be engineered, and serve as benchmarks, to quantify numerous <em>per image</em> errors, <em>i.e.</em>, hallucinations, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the DGM-generated images.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 23-29"},"PeriodicalIF":3.9,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002551/pdfft?md5=5df7937160b427d56d6a3c847ac5fdfc&pid=1-s2.0-S0167865524002551-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the special section “Advances trends of pattern recognition for intelligent systems applications” (SS:ISPR23) 特别单元 "智能系统应用模式识别的进展趋势"(SS:ISPR23)介绍
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-01 DOI: 10.1016/j.patrec.2024.08.005
Akram Bennour , Tolga Ensari , Mohammed Al-Shabi
{"title":"Introduction to the special section “Advances trends of pattern recognition for intelligent systems applications” (SS:ISPR23)","authors":"Akram Bennour ,&nbsp;Tolga Ensari ,&nbsp;Mohammed Al-Shabi","doi":"10.1016/j.patrec.2024.08.005","DOIUrl":"10.1016/j.patrec.2024.08.005","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Page 271"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight attention-driven distillation model for human pose estimation 用于人类姿势估计的轻量级注意力驱动蒸馏模型
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-01 DOI: 10.1016/j.patrec.2024.08.009
Falai Wei, Xiaofang Hu

Currently, research on human pose estimation tasks primarily focuses on heatmap-based and regression-based methods. However, the increasing complexity of heatmap models and the low accuracy of regression methods are becoming significant barriers to the advancement of the field. In recent years, researchers have begun exploring new methods to transfer knowledge from heatmap models to regression models. Recognizing the limitations of existing approaches, our study introduces a novel distillation model that is both lightweight and precise. In the feature extraction phase, we design the Channel-Attention-Unit (CAU), which integrates group convolution with an attention mechanism to effectively reduce redundancy while maintaining model accuracy with a decreased parameter count. During distillation, we develop the attention loss function, LA, which enhances the model’s capacity to locate key points quickly and accurately, emulating the effect of additional transformer layers and boosting precision without the need for increased parameters or network depth. Specifically, on the CrowdPose test dataset, our model achieves 71.7% mAP with 4.3M parameters, 2.2 GFLOPs, and 51.3 FPS. Experimental results demonstrates the model’s strong capabilities in both accuracy and efficiency, making it a viable option for real-time posture estimation tasks in real-world environments.

目前,有关人体姿态估计任务的研究主要集中在基于热图和回归的方法上。然而,热图模型的日益复杂性和回归方法的低准确性正成为该领域发展的重大障碍。近年来,研究人员开始探索从热图模型向回归模型转移知识的新方法。认识到现有方法的局限性,我们的研究引入了一种既轻便又精确的新型蒸馏模型。在特征提取阶段,我们设计了通道-注意力单元(CAU),它将群卷积与注意力机制整合在一起,有效减少了冗余,同时在减少参数数量的情况下保持了模型的准确性。在蒸馏过程中,我们开发了注意力损失函数 LA,该函数增强了模型快速、准确定位关键点的能力,模拟了额外变压器层的效果,并在无需增加参数或网络深度的情况下提高了精度。具体来说,在 CrowdPose 测试数据集上,我们的模型在 4.3M 参数、2.2 GFLOPs 和 51.3 FPS 的条件下实现了 71.7% 的 mAP。实验结果表明,该模型在准确性和效率方面都具有很强的能力,使其成为现实环境中实时姿态估计任务的可行选择。
{"title":"A lightweight attention-driven distillation model for human pose estimation","authors":"Falai Wei,&nbsp;Xiaofang Hu","doi":"10.1016/j.patrec.2024.08.009","DOIUrl":"10.1016/j.patrec.2024.08.009","url":null,"abstract":"<div><p>Currently, research on human pose estimation tasks primarily focuses on heatmap-based and regression-based methods. However, the increasing complexity of heatmap models and the low accuracy of regression methods are becoming significant barriers to the advancement of the field. In recent years, researchers have begun exploring new methods to transfer knowledge from heatmap models to regression models. Recognizing the limitations of existing approaches, our study introduces a novel distillation model that is both lightweight and precise. In the feature extraction phase, we design the Channel-Attention-Unit (CAU), which integrates group convolution with an attention mechanism to effectively reduce redundancy while maintaining model accuracy with a decreased parameter count. During distillation, we develop the attention loss function, <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>A</mi></mrow></msub></math></span>, which enhances the model’s capacity to locate key points quickly and accurately, emulating the effect of additional transformer layers and boosting precision without the need for increased parameters or network depth. Specifically, on the CrowdPose test dataset, our model achieves 71.7% mAP with 4.3M parameters, 2.2 GFLOPs, and 51.3 FPS. Experimental results demonstrates the model’s strong capabilities in both accuracy and efficiency, making it a viable option for real-time posture estimation tasks in real-world environments.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 247-253"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semantic guidance-based fusion network for multi-label image classification 基于语义引导的多标签图像分类融合网络
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-01 DOI: 10.1016/j.patrec.2024.08.020
Jiuhang Wang , Hongying Tang , Shanshan Luo , Liqi Yang , Shusheng Liu , Aoping Hong , Baoqing Li

Multi-label image classification (MLIC), a fundamental task assigning multiple labels to each image, has been seen notable progress in recent years. Considering simultaneous appearances of objects in the physical world, modeling object correlations is crucial for enhancing classification accuracy. This involves accounting for spatial image feature correlation and label semantic correlation. However, existing methods struggle to establish these correlations due to complex spatial location and label semantic relationships. On the other hand, regarding the fusion of image feature relevance and label semantic relevance, existing methods typically learn a semantic representation in the final CNN layer to combine spatial and label semantic correlations. However, different CNN layers capture features at diverse scales and possess distinct discriminative abilities. To address these issues, in this paper we introduce the Semantic Guidance-Based Fusion Network (SGFN) for MLIC. To model spatial image feature correlation, we leverage the advanced TResNet architecture as the backbone network and employ the Feature Aggregation Module for capturing global spatial correlation. For label semantic correlation, we establish both local and global semantic correlation. We further enrich model features by learning semantic representations across multiple convolutional layers. Our method outperforms current state-of-the-art techniques on PASCAL VOC (2007, 2012) and MS-COCO datasets.

多标签图像分类(MLIC)是一项为每幅图像分配多个标签的基本任务,近年来取得了显著进展。考虑到物理世界中物体的同时出现,建立物体相关性模型对于提高分类准确性至关重要。这就需要考虑空间图像特征相关性和标签语义相关性。然而,由于空间位置和标签语义关系复杂,现有方法很难建立这些相关性。另一方面,关于图像特征相关性和标签语义相关性的融合,现有方法通常在最后的 CNN 层学习语义表示,以结合空间和标签语义相关性。然而,不同的 CNN 层捕捉不同尺度的特征,并具有不同的判别能力。为了解决这些问题,我们在本文中为 MLIC 引入了基于语义引导的融合网络(SGFN)。为了建立空间图像特征相关性模型,我们利用先进的 TResNet 架构作为骨干网络,并采用特征聚合模块来捕捉全局空间相关性。对于标签语义相关性,我们建立了局部和全局语义相关性。我们通过学习多个卷积层的语义表征来进一步丰富模型特征。在 PASCAL VOC(2007 年,2012 年)和 MS-COCO 数据集上,我们的方法优于目前最先进的技术。
{"title":"A semantic guidance-based fusion network for multi-label image classification","authors":"Jiuhang Wang ,&nbsp;Hongying Tang ,&nbsp;Shanshan Luo ,&nbsp;Liqi Yang ,&nbsp;Shusheng Liu ,&nbsp;Aoping Hong ,&nbsp;Baoqing Li","doi":"10.1016/j.patrec.2024.08.020","DOIUrl":"10.1016/j.patrec.2024.08.020","url":null,"abstract":"<div><p>Multi-label image classification (MLIC), a fundamental task assigning multiple labels to each image, has been seen notable progress in recent years. Considering simultaneous appearances of objects in the physical world, modeling object correlations is crucial for enhancing classification accuracy. This involves accounting for spatial image feature correlation and label semantic correlation. However, existing methods struggle to establish these correlations due to complex spatial location and label semantic relationships. On the other hand, regarding the fusion of image feature relevance and label semantic relevance, existing methods typically learn a semantic representation in the final CNN layer to combine spatial and label semantic correlations. However, different CNN layers capture features at diverse scales and possess distinct discriminative abilities. To address these issues, in this paper we introduce the Semantic Guidance-Based Fusion Network (SGFN) for MLIC. To model spatial image feature correlation, we leverage the advanced TResNet architecture as the backbone network and employ the Feature Aggregation Module for capturing global spatial correlation. For label semantic correlation, we establish both local and global semantic correlation. We further enrich model features by learning semantic representations across multiple convolutional layers. Our method outperforms current state-of-the-art techniques on PASCAL VOC (2007, 2012) and MS-COCO datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 254-261"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1