首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Choroid plexus segmentation in MRI using the novel T1×FLAIR modality and PSU-Mamba: projective scan U-Mamba approach MRI脉络膜丛分割使用新颖的T1×FLAIR模式和PSU-Mamba:投影扫描U-Mamba方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-01-25 DOI: 10.1016/j.patrec.2026.01.024
Lia Schmid , Giuseppe M. Facchi , Francesco Agnelli , Giorgio Bocca , Luca Sacchi , Raffaella Lanzarotti
The Choroid Plexus (CP) is emerging as a biomarker for neurodegenerative diseases (NDDs) such as Alzheimer’s Disease and its precursor pathologies. However, segmentation remains challenging, especially without Contrast-Enhanced T1-weighted (CE-T1w) imaging which is invasive and rarely used in NDDs. To address these challenges, we present three key contributions. First, we propose and validate T1×FLAIR, a novel, non-invasive modality created by gamma-corrected voxelwise multiplication of coregistered T1w and FLAIR images. Expert visual inspection confirmed that this choice enhances CP visibility while preserving standard resolution. Second, we release ChP-MRI, a high-quality MRI dataset of 168 patients with NDDs or Multiple Sclerosis, including T1w, FLAIR, and T1×FLAIR images with expert-verified CP segmentations. The dataset is multi-pathology, and accompanied by demographic details to support benchmarking. Third, we propose PSU-Mamba (Projective Scan U-Mamba), an adaptation of the U-Mamba segmentation model where the first encoder block is a Mamba layer equipped with a PCA-based scan path derived from anatomical priors. This design enhances segmentation accuracy, maintains linear complexity, and converges faster with fewer training epochs. Experiments on ChP-MRI confirm that T1×FLAIR is a more faithful substitute for CE-T1w than T1w, and that PSU-Mamba offers systematic robustness in segmenting the CP. The source code and the dataset are available at https://github.com/phuselab/PSU_Mamba#.
脉络膜丛(CP)正在成为神经退行性疾病(ndd)如阿尔茨海默病及其前体病理的生物标志物。然而,分割仍然具有挑战性,特别是没有对比增强t1加权(CE-T1w)成像,这种成像是侵入性的,很少用于ndd。为了应对这些挑战,我们提出了三个关键贡献。首先,我们提出并验证T1×FLAIR,这是一种新颖的非侵入性模式,通过对共配的T1w和FLAIR图像进行伽玛校正的体向乘法创建。专家目视检查证实,这种选择增强了CP可见性,同时保持标准分辨率。其次,我们发布了ChP-MRI,这是168例ndd或多发性硬化症患者的高质量MRI数据集,包括T1w, FLAIR和T1×FLAIR图像,并经过专家验证的CP分割。该数据集是多病理的,并附有人口统计细节,以支持基准。第三,我们提出了PSU-Mamba(投影扫描U-Mamba),这是一种U-Mamba分割模型的改编,其中第一个编码器块是曼巴层,配备了基于pca的扫描路径,该扫描路径来自解剖先验。该设计提高了分割精度,保持了线性复杂度,收敛速度更快,训练次数更少。在ChP-MRI上的实验证实T1×FLAIR是CE-T1w比T1w更忠实的替代品,并且PSU-Mamba在分割CP方面具有系统的鲁棒性。源代码和数据集可在https://github.com/phuselab/PSU_Mamba#上获得。
{"title":"Choroid plexus segmentation in MRI using the novel T1×FLAIR modality and PSU-Mamba: projective scan U-Mamba approach","authors":"Lia Schmid ,&nbsp;Giuseppe M. Facchi ,&nbsp;Francesco Agnelli ,&nbsp;Giorgio Bocca ,&nbsp;Luca Sacchi ,&nbsp;Raffaella Lanzarotti","doi":"10.1016/j.patrec.2026.01.024","DOIUrl":"10.1016/j.patrec.2026.01.024","url":null,"abstract":"<div><div>The Choroid Plexus (CP) is emerging as a biomarker for neurodegenerative diseases (NDDs) such as Alzheimer’s Disease and its precursor pathologies. However, segmentation remains challenging, especially without Contrast-Enhanced T1-weighted (CE-T1w) imaging which is invasive and rarely used in NDDs. To address these challenges, we present three key contributions. First, we propose and validate <strong>T1×FLAIR</strong>, a novel, non-invasive modality created by gamma-corrected voxelwise multiplication of coregistered T1w and FLAIR images. Expert visual inspection confirmed that this choice enhances CP visibility while preserving standard resolution. Second, we release <strong>ChP-MRI</strong>, a high-quality MRI dataset of 168 patients with NDDs or Multiple Sclerosis, including T1w, FLAIR, and T1×FLAIR images with expert-verified CP segmentations. The dataset is multi-pathology, and accompanied by demographic details to support benchmarking. Third, we propose <strong>PSU-Mamba</strong> (Projective Scan U-Mamba), an adaptation of the U-Mamba segmentation model where the first encoder block is a Mamba layer equipped with a PCA-based scan path derived from anatomical priors. This design enhances segmentation accuracy, maintains linear complexity, and converges faster with fewer training epochs. Experiments on ChP-MRI confirm that T1×FLAIR is a more faithful substitute for CE-T1w than T1w, and that PSU-Mamba offers systematic robustness in segmenting the CP. The source code and the dataset are available at <span><span>https://github.com/phuselab/PSU_Mamba#</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"202 ","pages":"Pages 1-7"},"PeriodicalIF":3.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146102453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image color correction via global-local collaborative strategy 基于全局-局部协同策略的水下图像色彩校正
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2026-01-23 DOI: 10.1016/j.patrec.2026.01.022
Ling Zhou , Baiqiang Yu , Hengyu Li , Wenyi Zhao , Weidong Zhang
Underwater images often suffer from color distortion, blur, and low contrast due to light scattering and absorption. To this end, we propose a color correction method for underwater images called GLCS, which leverages a global-local collaborative strategy to mitigate color distortion effectively. Specifically, we construct a weight matrix to guide the channel with minimal attenuation in performing global compensation for the other channels. Following this, we design a local feedback strategy that dynamically adjusts the weight matrix based on the image’s local color bias, enabling collaborative correction between the global and local components. Finally, we design a loss function that combines color difference, mean, and standard deviation disparities to control the iteration process and optimize the compensation. Extensive experiments reveal that GLCS, as a preprocessing step, effectively alleviates color distortion in underwater images and significantly enhances the visual quality and performance of subsequent image enhancement methods.
由于光的散射和吸收,水下图像经常遭受色彩失真,模糊和低对比度。为此,我们提出了一种称为GLCS的水下图像色彩校正方法,该方法利用全局-局部协作策略有效地减轻了色彩失真。具体来说,我们构建了一个权重矩阵来指导在对其他信道进行全局补偿时衰减最小的信道。随后,我们设计了一个局部反馈策略,根据图像的局部颜色偏差动态调整权重矩阵,实现全局和局部分量之间的协同校正。最后,我们设计了一个结合色差、均值和标准差差异的损失函数来控制迭代过程并优化补偿。大量实验表明,GLCS作为一个预处理步骤,可以有效地缓解水下图像的色彩失真,显著提高后续图像增强方法的视觉质量和性能。
{"title":"Underwater image color correction via global-local collaborative strategy","authors":"Ling Zhou ,&nbsp;Baiqiang Yu ,&nbsp;Hengyu Li ,&nbsp;Wenyi Zhao ,&nbsp;Weidong Zhang","doi":"10.1016/j.patrec.2026.01.022","DOIUrl":"10.1016/j.patrec.2026.01.022","url":null,"abstract":"<div><div>Underwater images often suffer from color distortion, blur, and low contrast due to light scattering and absorption. To this end, we propose a color correction method for underwater images called GLCS, which leverages a global-local collaborative strategy to mitigate color distortion effectively. Specifically, we construct a weight matrix to guide the channel with minimal attenuation in performing global compensation for the other channels. Following this, we design a local feedback strategy that dynamically adjusts the weight matrix based on the image’s local color bias, enabling collaborative correction between the global and local components. Finally, we design a loss function that combines color difference, mean, and standard deviation disparities to control the iteration process and optimize the compensation. Extensive experiments reveal that GLCS, as a preprocessing step, effectively alleviates color distortion in underwater images and significantly enhances the visual quality and performance of subsequent image enhancement methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 160-167"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid attention triple branch transformer net for underwater image enhancement 水下图像增强的混合关注三支路变压器网
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI: 10.1016/j.patrec.2026.01.014
Shaohui Jin , Guangpeng Li , Ziqin Xu , Yanxin Zhang , Zhengguang Qin , Hao Liu , Mingliang Xu
In real underwater scenes, the complexity of the environment leads to issues like light attenuation, scattering, and color distortion, resulting in reduced image quality and loss of details. To resolve these problems, we propose a hybrid attention triple branch transformer network (HATBformer). The backbone network adopts a three-layer encoder-decoder structure, making full use of the spatial and channel feature information of underwater images, and improving the network’s focus on color information and spatial regions with higher levels of attenuation. The detail enhancement branch incorporates the coordinate information perception mechanism and feature integration strategy through three consecutive feature enhancement blocks, aiming to deeply repair and optimize image details and effectively improve the image reconstruction quality. In addition, we established an underwater image dataset NLOS-TW that contains different optical thicknesses, including rich targets and various underwater scenes. Extensive experiments demonstrate that our method significantly enhances image quality and surpasses current state-of-the-art methods both qualitatively and quantitatively.
在真实的水下场景中,环境的复杂性导致光线衰减、散射、色彩失真等问题,导致图像质量下降和细节丢失。为了解决这些问题,我们提出了一种混合关注三支路变压器网络(HATBformer)。骨干网采用三层编码器-解码器结构,充分利用了水下图像的空间和信道特征信息,提高了网络对颜色信息和衰减程度较高的空间区域的关注。细节增强分支通过三个连续的特征增强块,结合坐标信息感知机制和特征集成策略,对图像细节进行深度修复和优化,有效提高图像重建质量。此外,我们建立了包含不同光学厚度的水下图像数据集NLOS-TW,包括丰富的目标和各种水下场景。大量的实验表明,我们的方法显著提高了图像质量,在定性和定量上都超过了目前最先进的方法。
{"title":"Hybrid attention triple branch transformer net for underwater image enhancement","authors":"Shaohui Jin ,&nbsp;Guangpeng Li ,&nbsp;Ziqin Xu ,&nbsp;Yanxin Zhang ,&nbsp;Zhengguang Qin ,&nbsp;Hao Liu ,&nbsp;Mingliang Xu","doi":"10.1016/j.patrec.2026.01.014","DOIUrl":"10.1016/j.patrec.2026.01.014","url":null,"abstract":"<div><div>In real underwater scenes, the complexity of the environment leads to issues like light attenuation, scattering, and color distortion, resulting in reduced image quality and loss of details. To resolve these problems, we propose a hybrid attention triple branch transformer network (HATBformer). The backbone network adopts a three-layer encoder-decoder structure, making full use of the spatial and channel feature information of underwater images, and improving the network’s focus on color information and spatial regions with higher levels of attenuation. The detail enhancement branch incorporates the coordinate information perception mechanism and feature integration strategy through three consecutive feature enhancement blocks, aiming to deeply repair and optimize image details and effectively improve the image reconstruction quality. In addition, we established an underwater image dataset NLOS-TW that contains different optical thicknesses, including rich targets and various underwater scenes. Extensive experiments demonstrate that our method significantly enhances image quality and surpasses current state-of-the-art methods both qualitatively and quantitatively.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 95-102"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering criteria: What defines a good cluster? 聚类标准:什么定义了一个好的集群?
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI: 10.1016/j.patrec.2026.01.011
Jinli Yao, Yong Zeng
Clustering is a fundamental technique in unsupervised learning, enabling the discovery of patterns and natural groupings in data without prior labels. Despite its widespread applications across domains, the field of clustering faces persistent challenges, including a lack of universally accepted definitions, inconsistent classification criteria, and varying evaluation metrics. This review paper addresses these gaps by exploring the core question: What defines a good cluster? We investigate and summarize the induction principle behind clustering problems, clustering algorithms, and evaluation indices. The paper classifies clustering algorithms based on their criteria and principles, providing a structured understanding of their methodologies. It further categorizes datasets into synthetic and real-world examples, identifying the challenges posed by diverse cluster characteristics, such as varying shapes, densities, sizes, and overlapping cases, alongside high-dimensionality. A comprehensive review of evaluation indices-grouped into compactness, connectedness, and separation types-highlights their importance in assessing clustering quality. By consolidating these aspects, this review provides a cohesive framework to understand clustering principles and their applications.
聚类是无监督学习中的一项基本技术,可以在没有事先标记的情况下发现数据中的模式和自然分组。尽管它在各个领域的广泛应用,聚类领域仍然面临着持续的挑战,包括缺乏普遍接受的定义、不一致的分类标准和不同的评估指标。这篇综述文章通过探讨核心问题来解决这些差距:什么定义了一个好的集群?我们研究和总结了聚类问题、聚类算法和评价指标背后的归纳原理。本文根据它们的标准和原理对聚类算法进行了分类,并对它们的方法进行了结构化的理解。它进一步将数据集分类为合成和现实世界的例子,确定了不同集群特征带来的挑战,例如不同的形状、密度、大小和重叠情况,以及高维。对评估指标的全面回顾-分为紧凑性,连通性和分离类型-强调了它们在评估聚类质量中的重要性。通过整合这些方面,本文提供了一个内聚的框架来理解集群原则及其应用。
{"title":"Clustering criteria: What defines a good cluster?","authors":"Jinli Yao,&nbsp;Yong Zeng","doi":"10.1016/j.patrec.2026.01.011","DOIUrl":"10.1016/j.patrec.2026.01.011","url":null,"abstract":"<div><div>Clustering is a fundamental technique in unsupervised learning, enabling the discovery of patterns and natural groupings in data without prior labels. Despite its widespread applications across domains, the field of clustering faces persistent challenges, including a lack of universally accepted definitions, inconsistent classification criteria, and varying evaluation metrics. This review paper addresses these gaps by exploring the core question: What defines a good cluster? We investigate and summarize the induction principle behind clustering problems, clustering algorithms, and evaluation indices. The paper classifies clustering algorithms based on their criteria and principles, providing a structured understanding of their methodologies. It further categorizes datasets into synthetic and real-world examples, identifying the challenges posed by diverse cluster characteristics, such as varying shapes, densities, sizes, and overlapping cases, alongside high-dimensionality. A comprehensive review of evaluation indices-grouped into compactness, connectedness, and separation types-highlights their importance in assessing clustering quality. By consolidating these aspects, this review provides a cohesive framework to understand clustering principles and their applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 103-108"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization performance distributions along learning curves 沿着学习曲线的泛化性能分布
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2026-01-03 DOI: 10.1016/j.patrec.2026.01.003
O. Taylan Turan , Marco Loog , David M.J. Tax
Learning curves show the expected performance with respect to training set size. This is often used to evaluate and compare models, tune hyper-parameters and determine how much data is needed for a specific performance. However, the distributional properties of performance are frequently overlooked on learning curves. Generally, only an average with standard error or standard deviation is used. In this paper, we analyze the distributions of generalization performance on the learning curves. We compile a high-fidelity learning curve database, both with respect to training set size and repetitions of the sampling for a fixed training set size. Our investigation reveals that generalization performance rarely follows a Gaussian distribution for classical classifiers, regardless of dataset balance, loss function, sampling method, or hyper-parameter tuning along learning curves. Furthermore, we show that the choice of statistical summary, mean versus measures like quantiles affect the top model rankings. Our findings highlight the importance of considering different statistical measures and use of non-parametric approaches when evaluating and selecting machine learning models with learning curves.
学习曲线显示了相对于训练集大小的预期性能。这通常用于评估和比较模型、调优超参数以及确定特定性能需要多少数据。然而,性能的分布特性在学习曲线上经常被忽视。通常,只使用标准误差或标准偏差的平均值。本文分析了泛化性能在学习曲线上的分布。我们编译了一个高保真的学习曲线数据库,既考虑了训练集的大小,也考虑了固定训练集大小的采样次数。我们的研究表明,经典分类器的泛化性能很少遵循高斯分布,无论数据集平衡、损失函数、采样方法或沿学习曲线的超参数调整如何。此外,我们表明统计汇总的选择,均值与分位数等度量会影响顶级模型排名。我们的研究结果强调了在评估和选择具有学习曲线的机器学习模型时考虑不同统计度量和使用非参数方法的重要性。
{"title":"Generalization performance distributions along learning curves","authors":"O. Taylan Turan ,&nbsp;Marco Loog ,&nbsp;David M.J. Tax","doi":"10.1016/j.patrec.2026.01.003","DOIUrl":"10.1016/j.patrec.2026.01.003","url":null,"abstract":"<div><div>Learning curves show the expected performance with respect to training set size. This is often used to evaluate and compare models, tune hyper-parameters and determine how much data is needed for a specific performance. However, the distributional properties of performance are frequently overlooked on learning curves. Generally, only an average with standard error or standard deviation is used. In this paper, we analyze the distributions of generalization performance on the learning curves. We compile a high-fidelity learning curve database, both with respect to training set size and repetitions of the sampling for a fixed training set size. Our investigation reveals that generalization performance rarely follows a Gaussian distribution for classical classifiers, regardless of dataset balance, loss function, sampling method, or hyper-parameter tuning along learning curves. Furthermore, we show that the choice of statistical summary, mean versus measures like quantiles affect the top model rankings. Our findings highlight the importance of considering different statistical measures and use of non-parametric approaches when evaluating and selecting machine learning models with learning curves.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 29-36"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LIFR-Net: A lightweight hybrid neural network with feature grouping for efficient food image recognition LIFR-Net:一个轻量级的混合神经网络,具有特征分组,用于有效的食品图像识别
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2025-12-25 DOI: 10.1016/j.patrec.2025.12.011
Qingshuo Sun , Guorui Sheng , Xiangyi Zhu , Jingru Song , Yongqiang Song , Tao Yao , Haiyang Wang , Lili Wang
Food image recognition based on deep learning plays a crucial role in the field of food computing. However, its high demand for computing resources limits its deployment on end devices and fails to effectively achieve intelligent diet and nutrition management. To address this issue, we aim to balance computational efficiency with recognition accuracy and propose a compact food image recognition model named Lightweight Inter-Group Food Recognition Net (LIFR-Net) that combines Convolutional Neural Network (CNN) and Vision Transformer (ViT). In LIFR-Net, a lightweight ViT module called Lightweight Inter-group Transformer (LIT) is designed, and a lightweight component named Feature Grouping Transformer is constructed, which can efficiently extract local and global features of food images and optimize the number of parameters and computational complexity. In addition, by shuffling and fusing irregularly grouped feature maps, the information exchange among channels is enhanced, and the recognition accuracy of the model is improved. Extensive experiments on three commonly used public food image recognition datasets, namely ETHZ Food–101, Vireo Food–172, and UEC Food–256, show that LIFR-Net achieves recognition accuracies of 90.49%, 91.04%, and 74.23% with lower numbers of parameters and computational amounts.
基于深度学习的食品图像识别在食品计算领域具有重要意义。然而,其对计算资源的高需求限制了其在终端设备上的部署,无法有效实现智能饮食和营养管理。为了解决这一问题,我们旨在平衡计算效率和识别精度,并提出了一种紧凑的食品图像识别模型,称为轻量级组间食品识别网络(LIFR-Net),该模型结合了卷积神经网络(CNN)和视觉变压器(ViT)。在LIFR-Net中,设计了轻量级的ViT模块light Inter-group Transformer (LIT),构建了轻量级的Feature Grouping Transformer组件,可以有效地提取食物图像的局部和全局特征,并优化参数数量和计算复杂度。此外,通过对不规则分组的特征映射进行洗牌和融合,增强了通道间的信息交换,提高了模型的识别精度。在ETHZ food - 101、Vireo food - 172和UEC food - 256三个常用的公共食品图像识别数据集上进行了大量实验,结果表明,在参数数量和计算量较少的情况下,LIFR-Net的识别准确率分别为90.49%、91.04%和74.23%。
{"title":"LIFR-Net: A lightweight hybrid neural network with feature grouping for efficient food image recognition","authors":"Qingshuo Sun ,&nbsp;Guorui Sheng ,&nbsp;Xiangyi Zhu ,&nbsp;Jingru Song ,&nbsp;Yongqiang Song ,&nbsp;Tao Yao ,&nbsp;Haiyang Wang ,&nbsp;Lili Wang","doi":"10.1016/j.patrec.2025.12.011","DOIUrl":"10.1016/j.patrec.2025.12.011","url":null,"abstract":"<div><div>Food image recognition based on deep learning plays a crucial role in the field of food computing. However, its high demand for computing resources limits its deployment on end devices and fails to effectively achieve intelligent diet and nutrition management. To address this issue, we aim to balance computational efficiency with recognition accuracy and propose a compact food image recognition model named Lightweight Inter-Group Food Recognition Net (LIFR-Net) that combines Convolutional Neural Network (CNN) and Vision Transformer (ViT). In LIFR-Net, a lightweight ViT module called Lightweight Inter-group Transformer (LIT) is designed, and a lightweight component named Feature Grouping Transformer is constructed, which can efficiently extract local and global features of food images and optimize the number of parameters and computational complexity. In addition, by shuffling and fusing irregularly grouped feature maps, the information exchange among channels is enhanced, and the recognition accuracy of the model is improved. Extensive experiments on three commonly used public food image recognition datasets, namely ETHZ Food–101, Vireo Food–172, and UEC Food–256, show that LIFR-Net achieves recognition accuracies of 90.49%, 91.04%, and 74.23% with lower numbers of parameters and computational amounts.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 22-28"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FHPG: A unified framework for transformer with pruning and quantization FHPG:具有剪枝和量化的变压器统一框架
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2026-01-22 DOI: 10.1016/j.patrec.2026.01.020
Ruiguo Ren
Vision transformers (ViTs) have demonstrated strong performance across various vision tasks; however, their high computational demands limit practical deployment. Although unified post-training frameworks for pruning and quantization have been applied to deep neural networks, existing methods do not explicitly integrate Fisher–Hessian information for structured pruning and quantization. To address this limitation, we propose Fisher Hessian particle swarm optimization–gravitational search algorithm (FHPG), a unified framework that jointly performs structured pruning and quantization to improve compression efficiency and accuracy. FHPG leverages Fisher–Hessian metrics to generate pruning masks and quantization intervals, reducing parameter redundancy and guiding quantization more effectively. In addition, a hybrid particle swarm optimization and gravitational search (PSO–GSA) strategy is incorporated to enhance optimization stability and avoid local minima. Experiments on standard vision benchmarks with transformer architectures, including DeiT and Swin, demonstrate that FHPG achieves substantial reductions in model size and inference latency while maintaining accuracy loss within approximately 1%.
视觉变压器(ViTs)在各种视觉任务中表现出强大的性能;然而,它们的高计算需求限制了实际部署。尽管统一的训练后剪枝和量化框架已应用于深度神经网络,但现有方法并未明确整合Fisher-Hessian信息进行结构化剪枝和量化。为了解决这一限制,我们提出了Fisher Hessian粒子群优化-引力搜索算法(FHPG),这是一个统一的框架,共同执行结构化修剪和量化,以提高压缩效率和精度。FHPG利用Fisher-Hessian指标生成剪枝掩模和量化间隔,减少参数冗余并更有效地指导量化。在此基础上,结合粒子群优化和引力搜索(PSO-GSA)策略,提高了优化稳定性,避免了局部极小值的出现。在变压器架构(包括DeiT和Swin)的标准视觉基准上进行的实验表明,FHPG在模型尺寸和推理延迟方面大幅降低,同时将精度损失保持在约1%以内。
{"title":"FHPG: A unified framework for transformer with pruning and quantization","authors":"Ruiguo Ren","doi":"10.1016/j.patrec.2026.01.020","DOIUrl":"10.1016/j.patrec.2026.01.020","url":null,"abstract":"<div><div>Vision transformers (ViTs) have demonstrated strong performance across various vision tasks; however, their high computational demands limit practical deployment. Although unified post-training frameworks for pruning and quantization have been applied to deep neural networks, existing methods do not explicitly integrate Fisher–Hessian information for structured pruning and quantization. To address this limitation, we propose Fisher Hessian particle swarm optimization–gravitational search algorithm (FHPG), a unified framework that jointly performs structured pruning and quantization to improve compression efficiency and accuracy. FHPG leverages Fisher–Hessian metrics to generate pruning masks and quantization intervals, reducing parameter redundancy and guiding quantization more effectively. In addition, a hybrid particle swarm optimization and gravitational search (PSO–GSA) strategy is incorporated to enhance optimization stability and avoid local minima. Experiments on standard vision benchmarks with transformer architectures, including DeiT and Swin, demonstrate that FHPG achieves substantial reductions in model size and inference latency while maintaining accuracy loss within approximately 1%.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 174-179"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146188734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBASNet: A double-branch adaptive segmentation network for remote sensing image 基于DBASNet的遥感图像双分支自适应分割网络
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2025-11-30 DOI: 10.1016/j.patrec.2025.11.043
Bo Huang , Yiwei Lu , Changsheng Yin , Ruopeng Yang , Yu Tao , Yongqi Shi , Shijie Wang , Qian Zhao
With the rapid development of artificial intelligence technology, deep learning has been widely applied in the semantic segmentation of remote sensing images. Current methods for remote sensing semantic segmentation mainly employ architectures based on convolutional neural networks and Transformer networks, achieving good performance in segmentation tasks. However, existing approaches fail to optimize segmentation for diverse terrain characteristics, leading to limitations in segmentation accuracy in complex scenes. To address this, we propose a novel network called DBASNet, which consists of two decoding branches: road topology and terrain classification. The former focuses on the integrity of the topological structure of road terrains, while the latter emphasizes the accuracy of other terrain segmentations. Experiments demonstrate that DBASNet achieves state-of-the-art semantic segmentation results by balancing terrain segmentation accuracy with road connectivity on the LoveDA and LandCover.ai datasets.
随着人工智能技术的快速发展,深度学习在遥感图像的语义分割中得到了广泛的应用。目前的遥感语义切分方法主要采用基于卷积神经网络和Transformer网络的体系结构,在切分任务中取得了较好的性能。然而,现有的分割方法无法针对不同的地形特征进行分割优化,导致复杂场景下的分割精度受到限制。为了解决这个问题,我们提出了一个名为DBASNet的新型网络,它由两个解码分支组成:道路拓扑和地形分类。前者注重道路地形拓扑结构的完整性,后者强调其他地形分割的准确性。实验表明,DBASNet通过平衡地形分割精度和道路连通性,在LoveDA和LandCover上实现了最先进的语义分割结果。人工智能的数据集。
{"title":"DBASNet: A double-branch adaptive segmentation network for remote sensing image","authors":"Bo Huang ,&nbsp;Yiwei Lu ,&nbsp;Changsheng Yin ,&nbsp;Ruopeng Yang ,&nbsp;Yu Tao ,&nbsp;Yongqi Shi ,&nbsp;Shijie Wang ,&nbsp;Qian Zhao","doi":"10.1016/j.patrec.2025.11.043","DOIUrl":"10.1016/j.patrec.2025.11.043","url":null,"abstract":"<div><div>With the rapid development of artificial intelligence technology, deep learning has been widely applied in the semantic segmentation of remote sensing images. Current methods for remote sensing semantic segmentation mainly employ architectures based on convolutional neural networks and Transformer networks, achieving good performance in segmentation tasks. However, existing approaches fail to optimize segmentation for diverse terrain characteristics, leading to limitations in segmentation accuracy in complex scenes. To address this, we propose a novel network called DBASNet, which consists of two decoding branches: road topology and terrain classification. The former focuses on the integrity of the topological structure of road terrains, while the latter emphasizes the accuracy of other terrain segmentations. Experiments demonstrate that DBASNet achieves state-of-the-art semantic segmentation results by balancing terrain segmentation accuracy with road connectivity on the LoveDA and LandCover.ai datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 9-14"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From-scratch dexterous grasp type annotation with SAM and lightweight vision-language models 从头开始灵巧的抓取类型注释与SAM和轻量级的视觉语言模型
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2026-01-17 DOI: 10.1016/j.patrec.2026.01.018
Yifan Wang , Long Cheng
Dexterous robotic hands enable versatile manipulation but require large annotated datasets for training, which are costly to obtain. This work presents a framework that integrates the Segment Anything Model (SAM) and small-scale vision-language models (VLMs) to automatically generate annotations from RGB-D images. Guided by the Fugl-Meyer grasp taxonomy and prompt engineering, the system produces labeled data from scratch, including object segmentation masks, semantic categories, and grasp type labels. Experimental results demonstrate that the proposed framework can successfully generate labeled RGB-D grasp data while enhancing the performance of lightweight VLMs on relevant task-specific submodules, underscoring its potential to accelerate research in dexterous manipulation.
灵巧的机器人手可以进行多种操作,但需要大量带注释的数据集进行训练,而这些数据集的获取成本很高。这项工作提出了一个框架,该框架集成了分段任意模型(SAM)和小规模视觉语言模型(VLMs),以从RGB-D图像中自动生成注释。在Fugl-Meyer抓取分类法和提示工程的指导下,系统从零开始生成标记数据,包括对象分割掩码、语义类别和抓取类型标签。实验结果表明,该框架能够成功地生成标记的RGB-D抓取数据,同时增强了轻量级VLMs在相关任务特定子模块上的性能,突显了其加速灵巧操作研究的潜力。
{"title":"From-scratch dexterous grasp type annotation with SAM and lightweight vision-language models","authors":"Yifan Wang ,&nbsp;Long Cheng","doi":"10.1016/j.patrec.2026.01.018","DOIUrl":"10.1016/j.patrec.2026.01.018","url":null,"abstract":"<div><div>Dexterous robotic hands enable versatile manipulation but require large annotated datasets for training, which are costly to obtain. This work presents a framework that integrates the Segment Anything Model (SAM) and small-scale vision-language models (VLMs) to automatically generate annotations from RGB-D images. Guided by the Fugl-Meyer grasp taxonomy and prompt engineering, the system produces labeled data from scratch, including object segmentation masks, semantic categories, and grasp type labels. Experimental results demonstrate that the proposed framework can successfully generate labeled RGB-D grasp data while enhancing the performance of lightweight VLMs on relevant task-specific submodules, underscoring its potential to accelerate research in dexterous manipulation.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 145-151"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DMAGaze : Gaze estimation using feature disentanglement and multi-scale attention 基于特征解纠缠和多尺度注意力的凝视估计
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI: 10.1016/j.patrec.2026.01.013
Haohan Chen , Hongjia Liu , Shiyong Lan , Wenwu Wang , Yixin Qiao , Yao Li , Guonan Deng
Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images—a key bottleneck limiting its accuracy in real-world scenarios. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose related features, to improve overall performance. Firstly, we design a new continuous mask-based Disentangler to separate gaze-relevant and gaze-irrelevant information in facial images through reconstructing the eye and non-eye regions using a dual-branch architecture. Furthermore, we introduce a new attention module, called Multi-Scale Global Local Attention Module (MS-GLAM), to fuse the global and local information at multiple scales via a customized attention structure, thereby further enhancing the information from the Disentangler. Finally, we combine the global gaze-relevant features, with head pose and local eye features, and pass them through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been evaluated extensively on two widely used public datasets: obtaining a gaze estimation error of 3.74° on MPIIFaceGaze and 6.17° on RT-GENE, outperforming SOTA methods.
用于预测注视方向的注视估计通常面临着人脸图像中复杂注视无关信息干扰的挑战,这是限制其在现实场景中准确性的关键瓶颈。在这项工作中,我们提出了一种新的凝视估计框架DMAGaze,该框架从三个方面利用面部图像中的信息:凝视相关的全局特征(从面部图像中提取)、局部眼睛特征(从裁剪的眼罩中提取)和头部姿势相关特征,以提高整体性能。首先,我们设计了一种新的基于连续面具的解纠缠器,通过双分支结构重构眼睛和非眼睛区域,分离出人脸图像中与凝视相关和不相关的信息。此外,我们引入了一种新的注意力模块,称为多尺度全局局部注意力模块(MS-GLAM),通过定制的注意力结构融合多尺度的全局和局部信息,从而进一步增强来自解纠缠器的信息。最后,我们将全局凝视相关特征与头部姿态和局部眼睛特征结合起来,通过检测头进行高精度凝视估计。我们提出的DMAGaze已经在两个广泛使用的公共数据集上进行了广泛的评估:在MPIIFaceGaze和RT-GENE上获得了3.74°和6.17°的凝视估计误差,优于SOTA方法。
{"title":"DMAGaze : Gaze estimation using feature disentanglement and multi-scale attention","authors":"Haohan Chen ,&nbsp;Hongjia Liu ,&nbsp;Shiyong Lan ,&nbsp;Wenwu Wang ,&nbsp;Yixin Qiao ,&nbsp;Yao Li ,&nbsp;Guonan Deng","doi":"10.1016/j.patrec.2026.01.013","DOIUrl":"10.1016/j.patrec.2026.01.013","url":null,"abstract":"<div><div>Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images—a key bottleneck limiting its accuracy in real-world scenarios. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose related features, to improve overall performance. Firstly, we design a new continuous mask-based Disentangler to separate gaze-relevant and gaze-irrelevant information in facial images through reconstructing the eye and non-eye regions using a dual-branch architecture. Furthermore, we introduce a new attention module, called Multi-Scale Global Local Attention Module (MS-GLAM), to fuse the global and local information at multiple scales via a customized attention structure, thereby further enhancing the information from the Disentangler. Finally, we combine the global gaze-relevant features, with head pose and local eye features, and pass them through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been evaluated extensively on two widely used public datasets: obtaining a gaze estimation error of 3.74° on MPIIFaceGaze and 6.17° on RT-GENE, outperforming SOTA methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 109-116"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1