首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Choroid plexus segmentation in MRI using the novel T1×FLAIR modality and PSU-Mamba: projective scan U-Mamba approach MRI脉络膜丛分割使用新颖的T1×FLAIR模式和PSU-Mamba:投影扫描U-Mamba方法
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-25 DOI: 10.1016/j.patrec.2026.01.024
Lia Schmid , Giuseppe M. Facchi , Francesco Agnelli , Giorgio Bocca , Luca Sacchi , Raffaella Lanzarotti
The Choroid Plexus (CP) is emerging as a biomarker for neurodegenerative diseases (NDDs) such as Alzheimer’s Disease and its precursor pathologies. However, segmentation remains challenging, especially without Contrast-Enhanced T1-weighted (CE-T1w) imaging which is invasive and rarely used in NDDs. To address these challenges, we present three key contributions. First, we propose and validate T1×FLAIR, a novel, non-invasive modality created by gamma-corrected voxelwise multiplication of coregistered T1w and FLAIR images. Expert visual inspection confirmed that this choice enhances CP visibility while preserving standard resolution. Second, we release ChP-MRI, a high-quality MRI dataset of 168 patients with NDDs or Multiple Sclerosis, including T1w, FLAIR, and T1×FLAIR images with expert-verified CP segmentations. The dataset is multi-pathology, and accompanied by demographic details to support benchmarking. Third, we propose PSU-Mamba (Projective Scan U-Mamba), an adaptation of the U-Mamba segmentation model where the first encoder block is a Mamba layer equipped with a PCA-based scan path derived from anatomical priors. This design enhances segmentation accuracy, maintains linear complexity, and converges faster with fewer training epochs. Experiments on ChP-MRI confirm that T1×FLAIR is a more faithful substitute for CE-T1w than T1w, and that PSU-Mamba offers systematic robustness in segmenting the CP. The source code and the dataset are available at https://github.com/phuselab/PSU_Mamba#.
脉络膜丛(CP)正在成为神经退行性疾病(ndd)如阿尔茨海默病及其前体病理的生物标志物。然而,分割仍然具有挑战性,特别是没有对比增强t1加权(CE-T1w)成像,这种成像是侵入性的,很少用于ndd。为了应对这些挑战,我们提出了三个关键贡献。首先,我们提出并验证T1×FLAIR,这是一种新颖的非侵入性模式,通过对共配的T1w和FLAIR图像进行伽玛校正的体向乘法创建。专家目视检查证实,这种选择增强了CP可见性,同时保持标准分辨率。其次,我们发布了ChP-MRI,这是168例ndd或多发性硬化症患者的高质量MRI数据集,包括T1w, FLAIR和T1×FLAIR图像,并经过专家验证的CP分割。该数据集是多病理的,并附有人口统计细节,以支持基准。第三,我们提出了PSU-Mamba(投影扫描U-Mamba),这是一种U-Mamba分割模型的改编,其中第一个编码器块是曼巴层,配备了基于pca的扫描路径,该扫描路径来自解剖先验。该设计提高了分割精度,保持了线性复杂度,收敛速度更快,训练次数更少。在ChP-MRI上的实验证实T1×FLAIR是CE-T1w比T1w更忠实的替代品,并且PSU-Mamba在分割CP方面具有系统的鲁棒性。源代码和数据集可在https://github.com/phuselab/PSU_Mamba#上获得。
{"title":"Choroid plexus segmentation in MRI using the novel T1×FLAIR modality and PSU-Mamba: projective scan U-Mamba approach","authors":"Lia Schmid ,&nbsp;Giuseppe M. Facchi ,&nbsp;Francesco Agnelli ,&nbsp;Giorgio Bocca ,&nbsp;Luca Sacchi ,&nbsp;Raffaella Lanzarotti","doi":"10.1016/j.patrec.2026.01.024","DOIUrl":"10.1016/j.patrec.2026.01.024","url":null,"abstract":"<div><div>The Choroid Plexus (CP) is emerging as a biomarker for neurodegenerative diseases (NDDs) such as Alzheimer’s Disease and its precursor pathologies. However, segmentation remains challenging, especially without Contrast-Enhanced T1-weighted (CE-T1w) imaging which is invasive and rarely used in NDDs. To address these challenges, we present three key contributions. First, we propose and validate <strong>T1×FLAIR</strong>, a novel, non-invasive modality created by gamma-corrected voxelwise multiplication of coregistered T1w and FLAIR images. Expert visual inspection confirmed that this choice enhances CP visibility while preserving standard resolution. Second, we release <strong>ChP-MRI</strong>, a high-quality MRI dataset of 168 patients with NDDs or Multiple Sclerosis, including T1w, FLAIR, and T1×FLAIR images with expert-verified CP segmentations. The dataset is multi-pathology, and accompanied by demographic details to support benchmarking. Third, we propose <strong>PSU-Mamba</strong> (Projective Scan U-Mamba), an adaptation of the U-Mamba segmentation model where the first encoder block is a Mamba layer equipped with a PCA-based scan path derived from anatomical priors. This design enhances segmentation accuracy, maintains linear complexity, and converges faster with fewer training epochs. Experiments on ChP-MRI confirm that T1×FLAIR is a more faithful substitute for CE-T1w than T1w, and that PSU-Mamba offers systematic robustness in segmenting the CP. The source code and the dataset are available at <span><span>https://github.com/phuselab/PSU_Mamba#</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"202 ","pages":"Pages 1-7"},"PeriodicalIF":3.3,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146102453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image color correction via global-local collaborative strategy 基于全局-局部协同策略的水下图像色彩校正
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patrec.2026.01.022
Ling Zhou , Baiqiang Yu , Hengyu Li , Wenyi Zhao , Weidong Zhang
Underwater images often suffer from color distortion, blur, and low contrast due to light scattering and absorption. To this end, we propose a color correction method for underwater images called GLCS, which leverages a global-local collaborative strategy to mitigate color distortion effectively. Specifically, we construct a weight matrix to guide the channel with minimal attenuation in performing global compensation for the other channels. Following this, we design a local feedback strategy that dynamically adjusts the weight matrix based on the image’s local color bias, enabling collaborative correction between the global and local components. Finally, we design a loss function that combines color difference, mean, and standard deviation disparities to control the iteration process and optimize the compensation. Extensive experiments reveal that GLCS, as a preprocessing step, effectively alleviates color distortion in underwater images and significantly enhances the visual quality and performance of subsequent image enhancement methods.
由于光的散射和吸收,水下图像经常遭受色彩失真,模糊和低对比度。为此,我们提出了一种称为GLCS的水下图像色彩校正方法,该方法利用全局-局部协作策略有效地减轻了色彩失真。具体来说,我们构建了一个权重矩阵来指导在对其他信道进行全局补偿时衰减最小的信道。随后,我们设计了一个局部反馈策略,根据图像的局部颜色偏差动态调整权重矩阵,实现全局和局部分量之间的协同校正。最后,我们设计了一个结合色差、均值和标准差差异的损失函数来控制迭代过程并优化补偿。大量实验表明,GLCS作为一个预处理步骤,可以有效地缓解水下图像的色彩失真,显著提高后续图像增强方法的视觉质量和性能。
{"title":"Underwater image color correction via global-local collaborative strategy","authors":"Ling Zhou ,&nbsp;Baiqiang Yu ,&nbsp;Hengyu Li ,&nbsp;Wenyi Zhao ,&nbsp;Weidong Zhang","doi":"10.1016/j.patrec.2026.01.022","DOIUrl":"10.1016/j.patrec.2026.01.022","url":null,"abstract":"<div><div>Underwater images often suffer from color distortion, blur, and low contrast due to light scattering and absorption. To this end, we propose a color correction method for underwater images called GLCS, which leverages a global-local collaborative strategy to mitigate color distortion effectively. Specifically, we construct a weight matrix to guide the channel with minimal attenuation in performing global compensation for the other channels. Following this, we design a local feedback strategy that dynamically adjusts the weight matrix based on the image’s local color bias, enabling collaborative correction between the global and local components. Finally, we design a loss function that combines color difference, mean, and standard deviation disparities to control the iteration process and optimize the compensation. Extensive experiments reveal that GLCS, as a preprocessing step, effectively alleviates color distortion in underwater images and significantly enhances the visual quality and performance of subsequent image enhancement methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 160-167"},"PeriodicalIF":3.3,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data augmentation in time series forecasting through inverted framework 利用倒框架进行时间序列预测的数据扩充
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patrec.2026.01.019
Hongming Tan , Ting Chen , Ruochong Jin , Wai Kin Victor Chan
Currently, iTransformer is one of the most popular and effective models for multivariate time series (MTS) forecasting. Thanks to its inverted framework, iTransformer effectively captures multivariate correlation. However, the inverted framework still has some limitations. It diminishes temporal interdependency information, and introduces noise in cases of nonsignificant variable correlation. To address these limitations, we introduce a novel data augmentation method on inverted framework, called DAIF. Unlike previous data augmentation methods, DAIF stands out as the first real-time augmentation specifically designed for the inverted framework in MTS forecasting. We first define the structure of the inverted sequence-to-sequence framework, then propose two different DAIF strategies, Frequency Filtering and Cross-variation Patching to address the existing challenges of the inverted framework. Experiments across multiple datasets and inverted models have demonstrated the effectiveness of our DAIF. Our codes are available at https://github.com/Travistan123/time-series-daif.
iTransformer模型是目前最流行、最有效的多变量时间序列预测模型之一。由于其倒置的框架,ittransformer有效地捕获了多变量相关性。然而,倒框架仍然有一些局限性。它减少了时间相互依赖信息,并在变量相关性不显著的情况下引入噪声。为了解决这些限制,我们在倒框架上引入了一种新的数据增强方法,称为DAIF。与以往的数据增强方法不同,DAIF是第一个专门为MTS预测中的倒框架设计的实时增强方法。我们首先定义了反向序列到序列框架的结构,然后提出了两种不同的DAIF策略,频率滤波和交叉变异补丁,以解决反向框架存在的挑战。跨多个数据集和倒置模型的实验证明了我们的DAIF的有效性。我们的代码可在https://github.com/Travistan123/time-series-daif上获得。
{"title":"Data augmentation in time series forecasting through inverted framework","authors":"Hongming Tan ,&nbsp;Ting Chen ,&nbsp;Ruochong Jin ,&nbsp;Wai Kin Victor Chan","doi":"10.1016/j.patrec.2026.01.019","DOIUrl":"10.1016/j.patrec.2026.01.019","url":null,"abstract":"<div><div>Currently, iTransformer is one of the most popular and effective models for multivariate time series (MTS) forecasting. Thanks to its inverted framework, iTransformer effectively captures multivariate correlation. However, the inverted framework still has some limitations. It diminishes temporal interdependency information, and introduces noise in cases of nonsignificant variable correlation. To address these limitations, we introduce a novel data augmentation method on inverted framework, called DAIF. Unlike previous data augmentation methods, DAIF stands out as the first real-time augmentation specifically designed for the inverted framework in MTS forecasting. We first define the structure of the inverted sequence-to-sequence framework, then propose two different DAIF strategies, Frequency Filtering and Cross-variation Patching to address the existing challenges of the inverted framework. Experiments across multiple datasets and inverted models have demonstrated the effectiveness of our DAIF. Our codes are available at <span><span>https://github.com/Travistan123/time-series-daif</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 152-159"},"PeriodicalIF":3.3,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Voxel-MPI: Scene-adaptive multiplane images based local voxel tokenization with attention coordination for 3D scene representation 体素- mpi:基于局部体素标记的场景自适应多平面图像,具有三维场景表示的注意协调
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patrec.2026.01.006
Yu Liu, Xin Ding, Qiong Liu
With the continuous optimization of learning models, 3D scene reconstruction for novel view synthesis has witnessed remarkable progress and rapid development in recent years. Compared to mainstream 3D reconstruction methods such as NeRF and 3DGS, the Multiplane Image (MPI) method demonstrates a significant balance between computational efficiency and the preservation of global structure. To enhance scene details, some studies combine local planes with global Multilayer Perceptron (MLP) learning for MPI representation. However, the inherent global consistency of global MLP networks hinders the adaptive learning of local density information, leading to the loss of local geometric and texture details in the rendered images. To address this issue, we propose a method called Voxel-MPI, which adaptively enhances local texture representation in MPI. First, we voxelize the global MPI and encode an independent MLP network for the local MPI of each voxel, enabling adaptive learning of local scene information. Next, independently learning each local MPI can lead to inconsistent rendering between blocks, causing blocky artifacts. To mitigate this, we design a Voxel Attention Block that coordinates information learned across voxel-based local MPI at the same depth, ensuring consistency and coherence in scene rendering. Experimental results demonstrate that our method outperforms existing methods on widely used real-world datasets.
随着学习模型的不断优化,基于新视角合成的三维场景重建技术近年来取得了显著的进步和快速的发展。与NeRF和3DGS等主流三维重建方法相比,MPI方法在计算效率和全局结构保护之间取得了显著的平衡。为了增强场景细节,一些研究将局部平面与全局多层感知器(MLP)学习结合起来进行MPI表示。然而,全局MLP网络固有的全局一致性阻碍了局部密度信息的自适应学习,导致渲染图像中局部几何和纹理细节的丢失。为了解决这个问题,我们提出了一种称为体素-MPI的方法,该方法自适应地增强了MPI中的局部纹理表示。首先,我们将全局MPI体素化,并为每个体素的局部MPI编码一个独立的MLP网络,实现局部场景信息的自适应学习。接下来,独立学习每个局部MPI可能导致块之间呈现不一致,从而导致块工件。为了缓解这一问题,我们设计了一个体素注意力块,协调在相同深度下基于体素的局部MPI学习到的信息,确保场景渲染的一致性和连贯性。实验结果表明,该方法在广泛使用的实际数据集上优于现有方法。
{"title":"Voxel-MPI: Scene-adaptive multiplane images based local voxel tokenization with attention coordination for 3D scene representation","authors":"Yu Liu,&nbsp;Xin Ding,&nbsp;Qiong Liu","doi":"10.1016/j.patrec.2026.01.006","DOIUrl":"10.1016/j.patrec.2026.01.006","url":null,"abstract":"<div><div>With the continuous optimization of learning models, 3D scene reconstruction for novel view synthesis has witnessed remarkable progress and rapid development in recent years. Compared to mainstream 3D reconstruction methods such as NeRF and 3DGS, the Multiplane Image (MPI) method demonstrates a significant balance between computational efficiency and the preservation of global structure. To enhance scene details, some studies combine local planes with global Multilayer Perceptron (MLP) learning for MPI representation. However, the inherent global consistency of global MLP networks hinders the adaptive learning of local density information, leading to the loss of local geometric and texture details in the rendered images. To address this issue, we propose a method called Voxel-MPI, which adaptively enhances local texture representation in MPI. First, we voxelize the global MPI and encode an independent MLP network for the local MPI of each voxel, enabling adaptive learning of local scene information. Next, independently learning each local MPI can lead to inconsistent rendering between blocks, causing blocky artifacts. To mitigate this, we design a Voxel Attention Block that coordinates information learned across voxel-based local MPI at the same depth, ensuring consistency and coherence in scene rendering. Experimental results demonstrate that our method outperforms existing methods on widely used real-world datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 168-173"},"PeriodicalIF":3.3,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FHPG: A unified framework for transformer with pruning and quantization FHPG:具有剪枝和量化的变压器统一框架
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.patrec.2026.01.020
Ruiguo Ren
Vision transformers (ViTs) have demonstrated strong performance across various vision tasks; however, their high computational demands limit practical deployment. Although unified post-training frameworks for pruning and quantization have been applied to deep neural networks, existing methods do not explicitly integrate Fisher–Hessian information for structured pruning and quantization. To address this limitation, we propose Fisher Hessian particle swarm optimization–gravitational search algorithm (FHPG), a unified framework that jointly performs structured pruning and quantization to improve compression efficiency and accuracy. FHPG leverages Fisher–Hessian metrics to generate pruning masks and quantization intervals, reducing parameter redundancy and guiding quantization more effectively. In addition, a hybrid particle swarm optimization and gravitational search (PSO–GSA) strategy is incorporated to enhance optimization stability and avoid local minima. Experiments on standard vision benchmarks with transformer architectures, including DeiT and Swin, demonstrate that FHPG achieves substantial reductions in model size and inference latency while maintaining accuracy loss within approximately 1%.
视觉变压器(ViTs)在各种视觉任务中表现出强大的性能;然而,它们的高计算需求限制了实际部署。尽管统一的训练后剪枝和量化框架已应用于深度神经网络,但现有方法并未明确整合Fisher-Hessian信息进行结构化剪枝和量化。为了解决这一限制,我们提出了Fisher Hessian粒子群优化-引力搜索算法(FHPG),这是一个统一的框架,共同执行结构化修剪和量化,以提高压缩效率和精度。FHPG利用Fisher-Hessian指标生成剪枝掩模和量化间隔,减少参数冗余并更有效地指导量化。在此基础上,结合粒子群优化和引力搜索(PSO-GSA)策略,提高了优化稳定性,避免了局部极小值的出现。在变压器架构(包括DeiT和Swin)的标准视觉基准上进行的实验表明,FHPG在模型尺寸和推理延迟方面大幅降低,同时将精度损失保持在约1%以内。
{"title":"FHPG: A unified framework for transformer with pruning and quantization","authors":"Ruiguo Ren","doi":"10.1016/j.patrec.2026.01.020","DOIUrl":"10.1016/j.patrec.2026.01.020","url":null,"abstract":"<div><div>Vision transformers (ViTs) have demonstrated strong performance across various vision tasks; however, their high computational demands limit practical deployment. Although unified post-training frameworks for pruning and quantization have been applied to deep neural networks, existing methods do not explicitly integrate Fisher–Hessian information for structured pruning and quantization. To address this limitation, we propose Fisher Hessian particle swarm optimization–gravitational search algorithm (FHPG), a unified framework that jointly performs structured pruning and quantization to improve compression efficiency and accuracy. FHPG leverages Fisher–Hessian metrics to generate pruning masks and quantization intervals, reducing parameter redundancy and guiding quantization more effectively. In addition, a hybrid particle swarm optimization and gravitational search (PSO–GSA) strategy is incorporated to enhance optimization stability and avoid local minima. Experiments on standard vision benchmarks with transformer architectures, including DeiT and Swin, demonstrate that FHPG achieves substantial reductions in model size and inference latency while maintaining accuracy loss within approximately 1%.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 174-179"},"PeriodicalIF":3.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146188734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From-scratch dexterous grasp type annotation with SAM and lightweight vision-language models 从头开始灵巧的抓取类型注释与SAM和轻量级的视觉语言模型
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-17 DOI: 10.1016/j.patrec.2026.01.018
Yifan Wang , Long Cheng
Dexterous robotic hands enable versatile manipulation but require large annotated datasets for training, which are costly to obtain. This work presents a framework that integrates the Segment Anything Model (SAM) and small-scale vision-language models (VLMs) to automatically generate annotations from RGB-D images. Guided by the Fugl-Meyer grasp taxonomy and prompt engineering, the system produces labeled data from scratch, including object segmentation masks, semantic categories, and grasp type labels. Experimental results demonstrate that the proposed framework can successfully generate labeled RGB-D grasp data while enhancing the performance of lightweight VLMs on relevant task-specific submodules, underscoring its potential to accelerate research in dexterous manipulation.
灵巧的机器人手可以进行多种操作,但需要大量带注释的数据集进行训练,而这些数据集的获取成本很高。这项工作提出了一个框架,该框架集成了分段任意模型(SAM)和小规模视觉语言模型(VLMs),以从RGB-D图像中自动生成注释。在Fugl-Meyer抓取分类法和提示工程的指导下,系统从零开始生成标记数据,包括对象分割掩码、语义类别和抓取类型标签。实验结果表明,该框架能够成功地生成标记的RGB-D抓取数据,同时增强了轻量级VLMs在相关任务特定子模块上的性能,突显了其加速灵巧操作研究的潜力。
{"title":"From-scratch dexterous grasp type annotation with SAM and lightweight vision-language models","authors":"Yifan Wang ,&nbsp;Long Cheng","doi":"10.1016/j.patrec.2026.01.018","DOIUrl":"10.1016/j.patrec.2026.01.018","url":null,"abstract":"<div><div>Dexterous robotic hands enable versatile manipulation but require large annotated datasets for training, which are costly to obtain. This work presents a framework that integrates the Segment Anything Model (SAM) and small-scale vision-language models (VLMs) to automatically generate annotations from RGB-D images. Guided by the Fugl-Meyer grasp taxonomy and prompt engineering, the system produces labeled data from scratch, including object segmentation masks, semantic categories, and grasp type labels. Experimental results demonstrate that the proposed framework can successfully generate labeled RGB-D grasp data while enhancing the performance of lightweight VLMs on relevant task-specific submodules, underscoring its potential to accelerate research in dexterous manipulation.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 145-151"},"PeriodicalIF":3.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-tuning ImageNet-pretrained models in medical image classification: Reassessing the impact of different factors 微调imagenet预训练模型在医学图像分类中的应用:重新评估不同因素的影响
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.patrec.2026.01.017
Juan Miguel Valverde , Vandad Imani , Jussi Tohka
Fine-tuning ImageNet-pretrained convolutional neural networks is a widely used strategy in medical image classification. Previous studies investigating the benefits of ImageNet pretraining over training from scratch have resulted in conflicting findings, likely due to lack of standardization in the experiments. Here, we identify various factors that were previously overlooked, and we propose a set of standardized experiments that account for these factors and that contribute to clarifying whether pretraining on ImageNet is truly advantageous. Our experiments revealed that dataset-independent factors (training set size, training time, and model size) cannot predict whether ImageNet pretraining will be beneficial. This is because the benefits of ImageNet pretraining depend on other, dataset and implementation specific, factors such as task difficulty and model architecture. We conclude that past demonstrations of the effectiveness of ImageNet pretraining are not universal, and that the potential advantages of ImageNet pretraining should be empirically evaluated in each scenario separately.
微调imagenet预训练卷积神经网络是一种广泛应用于医学图像分类的策略。先前的研究调查了ImageNet预训练相对于从头开始训练的好处,结果产生了相互矛盾的结果,可能是由于实验缺乏标准化。在这里,我们确定了以前被忽视的各种因素,我们提出了一套标准化的实验来解释这些因素,并有助于澄清在ImageNet上进行预训练是否真正有利。我们的实验表明,与数据集无关的因素(训练集大小、训练时间和模型大小)无法预测ImageNet预训练是否有益。这是因为ImageNet预训练的好处取决于其他数据集和实现特定的因素,如任务难度和模型架构。我们的结论是,过去对ImageNet预训练有效性的论证并不普遍,ImageNet预训练的潜在优势应该在每个场景中分别进行经验评估。
{"title":"Fine-tuning ImageNet-pretrained models in medical image classification: Reassessing the impact of different factors","authors":"Juan Miguel Valverde ,&nbsp;Vandad Imani ,&nbsp;Jussi Tohka","doi":"10.1016/j.patrec.2026.01.017","DOIUrl":"10.1016/j.patrec.2026.01.017","url":null,"abstract":"<div><div>Fine-tuning ImageNet-pretrained convolutional neural networks is a widely used strategy in medical image classification. Previous studies investigating the benefits of ImageNet pretraining over training from scratch have resulted in conflicting findings, likely due to lack of standardization in the experiments. Here, we identify various factors that were previously overlooked, and we propose a set of standardized experiments that account for these factors and that contribute to clarifying whether pretraining on ImageNet is truly advantageous. Our experiments revealed that dataset-independent factors (training set size, training time, and model size) cannot predict whether ImageNet pretraining will be beneficial. This is because the benefits of ImageNet pretraining depend on other, dataset and implementation specific, factors such as task difficulty and model architecture. We conclude that past demonstrations of the effectiveness of ImageNet pretraining are not universal, and that the potential advantages of ImageNet pretraining should be empirically evaluated in each scenario separately.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 132-137"},"PeriodicalIF":3.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid attention triple branch transformer net for underwater image enhancement 水下图像增强的混合关注三支路变压器网
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.patrec.2026.01.014
Shaohui Jin , Guangpeng Li , Ziqin Xu , Yanxin Zhang , Zhengguang Qin , Hao Liu , Mingliang Xu
In real underwater scenes, the complexity of the environment leads to issues like light attenuation, scattering, and color distortion, resulting in reduced image quality and loss of details. To resolve these problems, we propose a hybrid attention triple branch transformer network (HATBformer). The backbone network adopts a three-layer encoder-decoder structure, making full use of the spatial and channel feature information of underwater images, and improving the network’s focus on color information and spatial regions with higher levels of attenuation. The detail enhancement branch incorporates the coordinate information perception mechanism and feature integration strategy through three consecutive feature enhancement blocks, aiming to deeply repair and optimize image details and effectively improve the image reconstruction quality. In addition, we established an underwater image dataset NLOS-TW that contains different optical thicknesses, including rich targets and various underwater scenes. Extensive experiments demonstrate that our method significantly enhances image quality and surpasses current state-of-the-art methods both qualitatively and quantitatively.
在真实的水下场景中,环境的复杂性导致光线衰减、散射、色彩失真等问题,导致图像质量下降和细节丢失。为了解决这些问题,我们提出了一种混合关注三支路变压器网络(HATBformer)。骨干网采用三层编码器-解码器结构,充分利用了水下图像的空间和信道特征信息,提高了网络对颜色信息和衰减程度较高的空间区域的关注。细节增强分支通过三个连续的特征增强块,结合坐标信息感知机制和特征集成策略,对图像细节进行深度修复和优化,有效提高图像重建质量。此外,我们建立了包含不同光学厚度的水下图像数据集NLOS-TW,包括丰富的目标和各种水下场景。大量的实验表明,我们的方法显著提高了图像质量,在定性和定量上都超过了目前最先进的方法。
{"title":"Hybrid attention triple branch transformer net for underwater image enhancement","authors":"Shaohui Jin ,&nbsp;Guangpeng Li ,&nbsp;Ziqin Xu ,&nbsp;Yanxin Zhang ,&nbsp;Zhengguang Qin ,&nbsp;Hao Liu ,&nbsp;Mingliang Xu","doi":"10.1016/j.patrec.2026.01.014","DOIUrl":"10.1016/j.patrec.2026.01.014","url":null,"abstract":"<div><div>In real underwater scenes, the complexity of the environment leads to issues like light attenuation, scattering, and color distortion, resulting in reduced image quality and loss of details. To resolve these problems, we propose a hybrid attention triple branch transformer network (HATBformer). The backbone network adopts a three-layer encoder-decoder structure, making full use of the spatial and channel feature information of underwater images, and improving the network’s focus on color information and spatial regions with higher levels of attenuation. The detail enhancement branch incorporates the coordinate information perception mechanism and feature integration strategy through three consecutive feature enhancement blocks, aiming to deeply repair and optimize image details and effectively improve the image reconstruction quality. In addition, we established an underwater image dataset NLOS-TW that contains different optical thicknesses, including rich targets and various underwater scenes. Extensive experiments demonstrate that our method significantly enhances image quality and surpasses current state-of-the-art methods both qualitatively and quantitatively.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 95-102"},"PeriodicalIF":3.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering criteria: What defines a good cluster? 聚类标准:什么定义了一个好的集群?
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.patrec.2026.01.011
Jinli Yao, Yong Zeng
Clustering is a fundamental technique in unsupervised learning, enabling the discovery of patterns and natural groupings in data without prior labels. Despite its widespread applications across domains, the field of clustering faces persistent challenges, including a lack of universally accepted definitions, inconsistent classification criteria, and varying evaluation metrics. This review paper addresses these gaps by exploring the core question: What defines a good cluster? We investigate and summarize the induction principle behind clustering problems, clustering algorithms, and evaluation indices. The paper classifies clustering algorithms based on their criteria and principles, providing a structured understanding of their methodologies. It further categorizes datasets into synthetic and real-world examples, identifying the challenges posed by diverse cluster characteristics, such as varying shapes, densities, sizes, and overlapping cases, alongside high-dimensionality. A comprehensive review of evaluation indices-grouped into compactness, connectedness, and separation types-highlights their importance in assessing clustering quality. By consolidating these aspects, this review provides a cohesive framework to understand clustering principles and their applications.
聚类是无监督学习中的一项基本技术,可以在没有事先标记的情况下发现数据中的模式和自然分组。尽管它在各个领域的广泛应用,聚类领域仍然面临着持续的挑战,包括缺乏普遍接受的定义、不一致的分类标准和不同的评估指标。这篇综述文章通过探讨核心问题来解决这些差距:什么定义了一个好的集群?我们研究和总结了聚类问题、聚类算法和评价指标背后的归纳原理。本文根据它们的标准和原理对聚类算法进行了分类,并对它们的方法进行了结构化的理解。它进一步将数据集分类为合成和现实世界的例子,确定了不同集群特征带来的挑战,例如不同的形状、密度、大小和重叠情况,以及高维。对评估指标的全面回顾-分为紧凑性,连通性和分离类型-强调了它们在评估聚类质量中的重要性。通过整合这些方面,本文提供了一个内聚的框架来理解集群原则及其应用。
{"title":"Clustering criteria: What defines a good cluster?","authors":"Jinli Yao,&nbsp;Yong Zeng","doi":"10.1016/j.patrec.2026.01.011","DOIUrl":"10.1016/j.patrec.2026.01.011","url":null,"abstract":"<div><div>Clustering is a fundamental technique in unsupervised learning, enabling the discovery of patterns and natural groupings in data without prior labels. Despite its widespread applications across domains, the field of clustering faces persistent challenges, including a lack of universally accepted definitions, inconsistent classification criteria, and varying evaluation metrics. This review paper addresses these gaps by exploring the core question: What defines a good cluster? We investigate and summarize the induction principle behind clustering problems, clustering algorithms, and evaluation indices. The paper classifies clustering algorithms based on their criteria and principles, providing a structured understanding of their methodologies. It further categorizes datasets into synthetic and real-world examples, identifying the challenges posed by diverse cluster characteristics, such as varying shapes, densities, sizes, and overlapping cases, alongside high-dimensionality. A comprehensive review of evaluation indices-grouped into compactness, connectedness, and separation types-highlights their importance in assessing clustering quality. By consolidating these aspects, this review provides a cohesive framework to understand clustering principles and their applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 103-108"},"PeriodicalIF":3.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DMAGaze : Gaze estimation using feature disentanglement and multi-scale attention 基于特征解纠缠和多尺度注意力的凝视估计
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.patrec.2026.01.013
Haohan Chen , Hongjia Liu , Shiyong Lan , Wenwu Wang , Yixin Qiao , Yao Li , Guonan Deng
Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images—a key bottleneck limiting its accuracy in real-world scenarios. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose related features, to improve overall performance. Firstly, we design a new continuous mask-based Disentangler to separate gaze-relevant and gaze-irrelevant information in facial images through reconstructing the eye and non-eye regions using a dual-branch architecture. Furthermore, we introduce a new attention module, called Multi-Scale Global Local Attention Module (MS-GLAM), to fuse the global and local information at multiple scales via a customized attention structure, thereby further enhancing the information from the Disentangler. Finally, we combine the global gaze-relevant features, with head pose and local eye features, and pass them through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been evaluated extensively on two widely used public datasets: obtaining a gaze estimation error of 3.74° on MPIIFaceGaze and 6.17° on RT-GENE, outperforming SOTA methods.
用于预测注视方向的注视估计通常面临着人脸图像中复杂注视无关信息干扰的挑战,这是限制其在现实场景中准确性的关键瓶颈。在这项工作中,我们提出了一种新的凝视估计框架DMAGaze,该框架从三个方面利用面部图像中的信息:凝视相关的全局特征(从面部图像中提取)、局部眼睛特征(从裁剪的眼罩中提取)和头部姿势相关特征,以提高整体性能。首先,我们设计了一种新的基于连续面具的解纠缠器,通过双分支结构重构眼睛和非眼睛区域,分离出人脸图像中与凝视相关和不相关的信息。此外,我们引入了一种新的注意力模块,称为多尺度全局局部注意力模块(MS-GLAM),通过定制的注意力结构融合多尺度的全局和局部信息,从而进一步增强来自解纠缠器的信息。最后,我们将全局凝视相关特征与头部姿态和局部眼睛特征结合起来,通过检测头进行高精度凝视估计。我们提出的DMAGaze已经在两个广泛使用的公共数据集上进行了广泛的评估:在MPIIFaceGaze和RT-GENE上获得了3.74°和6.17°的凝视估计误差,优于SOTA方法。
{"title":"DMAGaze : Gaze estimation using feature disentanglement and multi-scale attention","authors":"Haohan Chen ,&nbsp;Hongjia Liu ,&nbsp;Shiyong Lan ,&nbsp;Wenwu Wang ,&nbsp;Yixin Qiao ,&nbsp;Yao Li ,&nbsp;Guonan Deng","doi":"10.1016/j.patrec.2026.01.013","DOIUrl":"10.1016/j.patrec.2026.01.013","url":null,"abstract":"<div><div>Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images—a key bottleneck limiting its accuracy in real-world scenarios. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose related features, to improve overall performance. Firstly, we design a new continuous mask-based Disentangler to separate gaze-relevant and gaze-irrelevant information in facial images through reconstructing the eye and non-eye regions using a dual-branch architecture. Furthermore, we introduce a new attention module, called Multi-Scale Global Local Attention Module (MS-GLAM), to fuse the global and local information at multiple scales via a customized attention structure, thereby further enhancing the information from the Disentangler. Finally, we combine the global gaze-relevant features, with head pose and local eye features, and pass them through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been evaluated extensively on two widely used public datasets: obtaining a gaze estimation error of 3.74° on MPIIFaceGaze and 6.17° on RT-GENE, outperforming SOTA methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 109-116"},"PeriodicalIF":3.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1