首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Generalization performance distributions along learning curves 沿着学习曲线的泛化性能分布
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.patrec.2026.01.003
O. Taylan Turan , Marco Loog , David M.J. Tax
Learning curves show the expected performance with respect to training set size. This is often used to evaluate and compare models, tune hyper-parameters and determine how much data is needed for a specific performance. However, the distributional properties of performance are frequently overlooked on learning curves. Generally, only an average with standard error or standard deviation is used. In this paper, we analyze the distributions of generalization performance on the learning curves. We compile a high-fidelity learning curve database, both with respect to training set size and repetitions of the sampling for a fixed training set size. Our investigation reveals that generalization performance rarely follows a Gaussian distribution for classical classifiers, regardless of dataset balance, loss function, sampling method, or hyper-parameter tuning along learning curves. Furthermore, we show that the choice of statistical summary, mean versus measures like quantiles affect the top model rankings. Our findings highlight the importance of considering different statistical measures and use of non-parametric approaches when evaluating and selecting machine learning models with learning curves.
学习曲线显示了相对于训练集大小的预期性能。这通常用于评估和比较模型、调优超参数以及确定特定性能需要多少数据。然而,性能的分布特性在学习曲线上经常被忽视。通常,只使用标准误差或标准偏差的平均值。本文分析了泛化性能在学习曲线上的分布。我们编译了一个高保真的学习曲线数据库,既考虑了训练集的大小,也考虑了固定训练集大小的采样次数。我们的研究表明,经典分类器的泛化性能很少遵循高斯分布,无论数据集平衡、损失函数、采样方法或沿学习曲线的超参数调整如何。此外,我们表明统计汇总的选择,均值与分位数等度量会影响顶级模型排名。我们的研究结果强调了在评估和选择具有学习曲线的机器学习模型时考虑不同统计度量和使用非参数方法的重要性。
{"title":"Generalization performance distributions along learning curves","authors":"O. Taylan Turan ,&nbsp;Marco Loog ,&nbsp;David M.J. Tax","doi":"10.1016/j.patrec.2026.01.003","DOIUrl":"10.1016/j.patrec.2026.01.003","url":null,"abstract":"<div><div>Learning curves show the expected performance with respect to training set size. This is often used to evaluate and compare models, tune hyper-parameters and determine how much data is needed for a specific performance. However, the distributional properties of performance are frequently overlooked on learning curves. Generally, only an average with standard error or standard deviation is used. In this paper, we analyze the distributions of generalization performance on the learning curves. We compile a high-fidelity learning curve database, both with respect to training set size and repetitions of the sampling for a fixed training set size. Our investigation reveals that generalization performance rarely follows a Gaussian distribution for classical classifiers, regardless of dataset balance, loss function, sampling method, or hyper-parameter tuning along learning curves. Furthermore, we show that the choice of statistical summary, mean versus measures like quantiles affect the top model rankings. Our findings highlight the importance of considering different statistical measures and use of non-parametric approaches when evaluating and selecting machine learning models with learning curves.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 29-36"},"PeriodicalIF":3.3,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical memory-enhanced networks for student knowledge tracing 学生知识追踪的分层记忆增强网络
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.patrec.2026.01.002
Huali Yang , Junjie Hu , Tao Huang , Shengze Hu , Wang Gao , Zhuoran Xu , Jing Geng
Accurate recognition of students’ knowledge states is critical for personalized education in the field of intelligent education. Knowledge tracing (KT) has emerged as an important research domain for tracing students’ knowledge states using the analysis of learning trajectory data. However, existing KT methods tend to overlook the hierarchical nature of memory, resulting in incomplete memory transfer. To address this issue, this study proposes a novel hierarchical memory-enhanced knowledge tracing (HMEKT) method that models the hierarchical structure of memory. HMEKT consists of three modules: shallow memory, deep memory, and performance prediction. Specifically, in the shallow memory module, learning and forgetting mechanisms are used to simulate memory growth and decay, capturing the dynamic changes in knowledge states. In the deep memory module, a dynamic memory matrix is used to store the student’s core knowledge system, transferring shallow memory into deep memory through enhancement and reduction gates that control memory transfer. Finally, for predicting student performance, relevant knowledge states are aggregated from the knowledge system matrix for future questions. Experiments on four datasets demonstrate the effectiveness of the model, with a 1.99% AUC gain on Assistment2017 compared to state-of-the-art methods.
在智能教育领域,准确识别学生的知识状态是实现个性化教育的关键。知识追踪(Knowledge tracing, KT)是利用学习轨迹数据分析来追踪学生知识状态的一个重要研究领域。然而,现有的KT方法往往忽略了内存的层次性,导致内存传输不完全。为了解决这一问题,本研究提出了一种新的分层记忆增强知识追踪(HMEKT)方法,该方法对记忆的分层结构进行建模。HMEKT由三个模块组成:浅内存、深内存和性能预测。具体而言,在浅记忆模块中,使用学习和遗忘机制来模拟记忆的增长和衰退,捕捉知识状态的动态变化。在深度记忆模块中,使用动态记忆矩阵来存储学生的核心知识系统,通过控制记忆传递的增强门和还原门将浅记忆传递到深度记忆中。最后,为了预测学生的表现,从知识系统矩阵中汇总相关的知识状态,以备将来的问题。在四个数据集上的实验证明了该模型的有效性,与最先进的方法相比,Assistment2017上的AUC增益为1.99%。
{"title":"Hierarchical memory-enhanced networks for student knowledge tracing","authors":"Huali Yang ,&nbsp;Junjie Hu ,&nbsp;Tao Huang ,&nbsp;Shengze Hu ,&nbsp;Wang Gao ,&nbsp;Zhuoran Xu ,&nbsp;Jing Geng","doi":"10.1016/j.patrec.2026.01.002","DOIUrl":"10.1016/j.patrec.2026.01.002","url":null,"abstract":"<div><div>Accurate recognition of students’ knowledge states is critical for personalized education in the field of intelligent education. Knowledge tracing (KT) has emerged as an important research domain for tracing students’ knowledge states using the analysis of learning trajectory data. However, existing KT methods tend to overlook the hierarchical nature of memory, resulting in incomplete memory transfer. To address this issue, this study proposes a novel hierarchical memory-enhanced knowledge tracing (HMEKT) method that models the hierarchical structure of memory. HMEKT consists of three modules: shallow memory, deep memory, and performance prediction. Specifically, in the shallow memory module, learning and forgetting mechanisms are used to simulate memory growth and decay, capturing the dynamic changes in knowledge states. In the deep memory module, a dynamic memory matrix is used to store the student’s core knowledge system, transferring shallow memory into deep memory through enhancement and reduction gates that control memory transfer. Finally, for predicting student performance, relevant knowledge states are aggregated from the knowledge system matrix for future questions. Experiments on four datasets demonstrate the effectiveness of the model, with a 1.99% AUC gain on Assistment2017 compared to state-of-the-art methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 37-44"},"PeriodicalIF":3.3,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-selective countnet: Enhancing text-guided object counting with frequency features 频率选择计数:通过频率特征增强文本引导的对象计数
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1016/j.patrec.2025.12.014
Cheng Qian , Jiwu Cao , Ying Mao , Ruotian Zhang , Fei Long , Jun Sang
Text-guided object counting aims to estimate the number of objects described by natural language within complex visual scenes. However, existing approaches often struggle to align textual intent with diverse visual patterns, especially when target objects vary in scale, appearance, or context.
To address these limitations, we propose Frequency-Selective CountNet (FSCNet), a novel framework that integrates spatial and frequency-domain features for precise text-guided counting. FSCNet introduces a Triple-Stream Attention Fusion Module (TSAFM) that combines textual, global, and local visual features. Additionally, an Adaptive Frequency Selector (AFS) dynamically emphasizes frequency components by separately modulating the magnitude and phase spectra, preserving geometric consistency during decoding.
Extensive experiments on the FSC-147 and CARPK datasets demonstrate that FSCNet achieves state-of-the-art performance, outperforming previous best methods by 18.34% in MAE and 27.41% in RMSE on FSC-147 (Avg.) and by 5.17%/7.58% on CARPK.
文本引导的对象计数旨在估计复杂视觉场景中自然语言描述的对象数量。然而,现有的方法往往难以将文本意图与不同的视觉模式结合起来,特别是当目标对象在规模、外观或上下文方面变化时。为了解决这些限制,我们提出了频率选择计数网(FSCNet),这是一个集成空间和频域特征的新框架,用于精确的文本引导计数。FSCNet引入了一个三流注意力融合模块(TSAFM),它结合了文本、全局和局部视觉特征。此外,自适应频率选择器(AFS)通过分别调制幅值和相位谱来动态强调频率成分,在解码过程中保持几何一致性。在FSC-147和CARPK数据集上的大量实验表明,FSCNet达到了最先进的性能,在FSC-147(平均)上,MAE和RMSE分别比以前的最佳方法高出18.34%和27.41%,在CARPK上分别高出5.17%和7.58%。
{"title":"Frequency-selective countnet: Enhancing text-guided object counting with frequency features","authors":"Cheng Qian ,&nbsp;Jiwu Cao ,&nbsp;Ying Mao ,&nbsp;Ruotian Zhang ,&nbsp;Fei Long ,&nbsp;Jun Sang","doi":"10.1016/j.patrec.2025.12.014","DOIUrl":"10.1016/j.patrec.2025.12.014","url":null,"abstract":"<div><div>Text-guided object counting aims to estimate the number of objects described by natural language within complex visual scenes. However, existing approaches often struggle to align textual intent with diverse visual patterns, especially when target objects vary in scale, appearance, or context.</div><div>To address these limitations, we propose Frequency-Selective CountNet (FSCNet), a novel framework that integrates spatial and frequency-domain features for precise text-guided counting. FSCNet introduces a Triple-Stream Attention Fusion Module (TSAFM) that combines textual, global, and local visual features. Additionally, an Adaptive Frequency Selector (AFS) dynamically emphasizes frequency components by separately modulating the magnitude and phase spectra, preserving geometric consistency during decoding.</div><div>Extensive experiments on the FSC-147 and CARPK datasets demonstrate that FSCNet achieves state-of-the-art performance, outperforming previous best methods by 18.34% in MAE and 27.41% in RMSE on FSC-147 (Avg.) and by 5.17%/7.58% on CARPK.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 15-21"},"PeriodicalIF":3.3,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PE-ViT: Parameter-efficient vision transformer with dimension-adaptive experts and economical attention PE-ViT:具有尺寸自适应专家和经济关注的参数高效视觉变压器
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-26 DOI: 10.1016/j.patrec.2025.12.013
Qun Li , Jiru He , Tiancheng Guo , Xinping Gao , Bir Bhanu
Recent advances in Mixture of Experts (MoE) have improved the representational capacity of Vision Transformer (ViT), but most existing methods remain constrained to token-level routing or homogeneous expert scaling, overlooking the diverse representation requirements across different layers and the parameter redundancy within attention modules. To address these problems, we propose PE-ViT, a novel parameter-efficient architecture that integrates the Dimension-adaptive Mixture of Experts (DMoE) and the Selective and Shared Attention (SSA) mechanisms to improve both computational efficiency and model performance. Specifically, DMoE adaptively allocates expert dimensions through layer-wise representation analysis and incorporates shared experts to enhance parameter utilization, while SSA reduces the parameter overhead of attention by dynamically selecting attention heads and sharing query-key projections. Experimental results demonstrate that PE-ViT consistently outperforms existing MoE methods across multiple benchmark datasets.
专家混合(MoE)的最新进展提高了视觉转换器(ViT)的表示能力,但大多数现有方法仍然局限于令牌级路由或同构专家缩放,忽略了不同层之间的不同表示需求和注意模块内部的参数冗余。为了解决这些问题,我们提出了一种新的参数高效架构PE-ViT,它集成了维度自适应混合专家(DMoE)和选择和共享注意(SSA)机制,以提高计算效率和模型性能。具体而言,DMoE通过分层表示分析自适应分配专家维度,并结合共享专家来提高参数利用率,而SSA通过动态选择关注头和共享查询键投影来降低注意力的参数开销。实验结果表明,PE-ViT在多个基准数据集上始终优于现有的MoE方法。
{"title":"PE-ViT: Parameter-efficient vision transformer with dimension-adaptive experts and economical attention","authors":"Qun Li ,&nbsp;Jiru He ,&nbsp;Tiancheng Guo ,&nbsp;Xinping Gao ,&nbsp;Bir Bhanu","doi":"10.1016/j.patrec.2025.12.013","DOIUrl":"10.1016/j.patrec.2025.12.013","url":null,"abstract":"<div><div>Recent advances in Mixture of Experts (MoE) have improved the representational capacity of Vision Transformer (ViT), but most existing methods remain constrained to token-level routing or homogeneous expert scaling, overlooking the diverse representation requirements across different layers and the parameter redundancy within attention modules. To address these problems, we propose PE-ViT, a novel parameter-efficient architecture that integrates the Dimension-adaptive Mixture of Experts (DMoE) and the Selective and Shared Attention (SSA) mechanisms to improve both computational efficiency and model performance. Specifically, DMoE adaptively allocates expert dimensions through layer-wise representation analysis and incorporates shared experts to enhance parameter utilization, while SSA reduces the parameter overhead of attention by dynamically selecting attention heads and sharing query-key projections. Experimental results demonstrate that PE-ViT consistently outperforms existing MoE methods across multiple benchmark datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 135-141"},"PeriodicalIF":3.3,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bounds on the Natarajan dimension of a class of linear multi-class predictors 一类线性多类预测器的Natarajan维的界
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-25 DOI: 10.1016/j.patrec.2025.12.012
Yanru Pan, Benchong Li
The Natarajan dimension is a crucial metric for measuring the capacity of a learning model and analyzing generalization ability of a classifier in multi-class classification tasks. In this paper, we present a tight upper bound of Natarajan dimension for linear multi-class predictors based on class sensitive feature mapping for multi-vector construction, and provide the exact Natarajan dimension when the dimension of feature is 2.
在多类分类任务中,Natarajan维数是衡量学习模型能力和分析分类器泛化能力的重要指标。本文提出了基于类敏感特征映射的线性多类预测器的Natarajan维数的紧上界,并给出了特征维数为2时的精确Natarajan维数。
{"title":"Bounds on the Natarajan dimension of a class of linear multi-class predictors","authors":"Yanru Pan,&nbsp;Benchong Li","doi":"10.1016/j.patrec.2025.12.012","DOIUrl":"10.1016/j.patrec.2025.12.012","url":null,"abstract":"<div><div>The Natarajan dimension is a crucial metric for measuring the capacity of a learning model and analyzing generalization ability of a classifier in multi-class classification tasks. In this paper, we present a tight upper bound of Natarajan dimension for linear multi-class predictors based on class sensitive feature mapping for multi-vector construction, and provide the exact Natarajan dimension when the dimension of feature is 2.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 129-134"},"PeriodicalIF":3.3,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Domain detection of AI-Generated text: Integrating linguistic richness and lexical pair dispersion via deep learning 人工智能生成文本的跨域检测:通过深度学习整合语言丰富度和词汇对分散
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-25 DOI: 10.1016/j.patrec.2025.12.010
Jingang Wang , Tong Xiao , Hui Du , Cheng Zhang , Peng Liu
Cross-domain detection of AI-generated text is a crucial task for cybersecurity. In practical scenarios, after being trained on one or multiple known text generation sources (source domain), a detection model must be capable of effectively identifying text generated by unknown and unseen sources (target domain). Current approaches suffer from limited cross-domain generalization due to insufficient structural adaptation to domain discrepancies. To address this critical limitation, we propose RiDis,a classification model that synergizes Linguistic Richness and Lexical Pair Dispersion for cross-domain AI-generated text detection. Through comprehensive statistical analysis, we establish Linguistic Richness and Lexical Pair Dispersion as discriminative indicators for distinguishing human-authored and machine-generated texts. Our architecture features two innovative components, a Semantic Coherence Extraction Module employing long-range receptive fields to capture linguistic richness through global semantic trend analysis, and a Contextual Dependency Extraction Module utilizing localized receptive fields to quantify lexical pair dispersion via fine-grained word association patterns. The framework further incorporates domain adaptation learning to enhance cross-domain detection robustness. Extensive evaluations demonstrate that our method achieves superior detection accuracy compared to state-of-the-art baselines across multiple domains, with experimental results showing significant performance improvements on cross-domain test scenarios.
人工智能生成文本的跨域检测是网络安全的一项重要任务。在实际场景中,检测模型在一个或多个已知的文本生成源(源域)上进行训练后,必须能够有效地识别未知和不可见源(目标域)生成的文本。目前的方法由于对领域差异的结构适应不足,导致跨领域泛化能力有限。为了解决这一关键限制,我们提出了RiDis,这是一种协同语言丰富度和词汇对分散的分类模型,用于跨领域人工智能生成的文本检测。通过全面的统计分析,我们建立了语言丰富度和词汇对离散度作为区分人类创作和机器生成文本的判别指标。我们的架构有两个创新的组件,一个是语义连贯提取模块,利用远程接受域通过全球语义趋势分析捕获语言的丰富性,另一个是上下文依赖提取模块,利用局部接受域通过细粒度的单词关联模式量化词汇对的分散。该框架进一步融合了领域自适应学习,增强了跨领域检测的鲁棒性。广泛的评估表明,与跨多个域的最先进的基线相比,我们的方法实现了更高的检测精度,实验结果显示跨域测试场景的显着性能改进。
{"title":"Cross-Domain detection of AI-Generated text: Integrating linguistic richness and lexical pair dispersion via deep learning","authors":"Jingang Wang ,&nbsp;Tong Xiao ,&nbsp;Hui Du ,&nbsp;Cheng Zhang ,&nbsp;Peng Liu","doi":"10.1016/j.patrec.2025.12.010","DOIUrl":"10.1016/j.patrec.2025.12.010","url":null,"abstract":"<div><div>Cross-domain detection of AI-generated text is a crucial task for cybersecurity. In practical scenarios, after being trained on one or multiple known text generation sources (source domain), a detection model must be capable of effectively identifying text generated by unknown and unseen sources (target domain). Current approaches suffer from limited cross-domain generalization due to insufficient structural adaptation to domain discrepancies. To address this critical limitation, we propose <strong>RiDis</strong>,a classification model that synergizes Linguistic <strong>Ri</strong>chness and Lexical Pair <strong>Dis</strong>persion for cross-domain AI-generated text detection. Through comprehensive statistical analysis, we establish Linguistic Richness and Lexical Pair Dispersion as discriminative indicators for distinguishing human-authored and machine-generated texts. Our architecture features two innovative components, a Semantic Coherence Extraction Module employing long-range receptive fields to capture linguistic richness through global semantic trend analysis, and a Contextual Dependency Extraction Module utilizing localized receptive fields to quantify lexical pair dispersion via fine-grained word association patterns. The framework further incorporates domain adaptation learning to enhance cross-domain detection robustness. Extensive evaluations demonstrate that our method achieves superior detection accuracy compared to state-of-the-art baselines across multiple domains, with experimental results showing significant performance improvements on cross-domain test scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 123-128"},"PeriodicalIF":3.3,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LIFR-Net: A lightweight hybrid neural network with feature grouping for efficient food image recognition LIFR-Net:一个轻量级的混合神经网络,具有特征分组,用于有效的食品图像识别
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-25 DOI: 10.1016/j.patrec.2025.12.011
Qingshuo Sun , Guorui Sheng , Xiangyi Zhu , Jingru Song , Yongqiang Song , Tao Yao , Haiyang Wang , Lili Wang
Food image recognition based on deep learning plays a crucial role in the field of food computing. However, its high demand for computing resources limits its deployment on end devices and fails to effectively achieve intelligent diet and nutrition management. To address this issue, we aim to balance computational efficiency with recognition accuracy and propose a compact food image recognition model named Lightweight Inter-Group Food Recognition Net (LIFR-Net) that combines Convolutional Neural Network (CNN) and Vision Transformer (ViT). In LIFR-Net, a lightweight ViT module called Lightweight Inter-group Transformer (LIT) is designed, and a lightweight component named Feature Grouping Transformer is constructed, which can efficiently extract local and global features of food images and optimize the number of parameters and computational complexity. In addition, by shuffling and fusing irregularly grouped feature maps, the information exchange among channels is enhanced, and the recognition accuracy of the model is improved. Extensive experiments on three commonly used public food image recognition datasets, namely ETHZ Food–101, Vireo Food–172, and UEC Food–256, show that LIFR-Net achieves recognition accuracies of 90.49%, 91.04%, and 74.23% with lower numbers of parameters and computational amounts.
基于深度学习的食品图像识别在食品计算领域具有重要意义。然而,其对计算资源的高需求限制了其在终端设备上的部署,无法有效实现智能饮食和营养管理。为了解决这一问题,我们旨在平衡计算效率和识别精度,并提出了一种紧凑的食品图像识别模型,称为轻量级组间食品识别网络(LIFR-Net),该模型结合了卷积神经网络(CNN)和视觉变压器(ViT)。在LIFR-Net中,设计了轻量级的ViT模块light Inter-group Transformer (LIT),构建了轻量级的Feature Grouping Transformer组件,可以有效地提取食物图像的局部和全局特征,并优化参数数量和计算复杂度。此外,通过对不规则分组的特征映射进行洗牌和融合,增强了通道间的信息交换,提高了模型的识别精度。在ETHZ food - 101、Vireo food - 172和UEC food - 256三个常用的公共食品图像识别数据集上进行了大量实验,结果表明,在参数数量和计算量较少的情况下,LIFR-Net的识别准确率分别为90.49%、91.04%和74.23%。
{"title":"LIFR-Net: A lightweight hybrid neural network with feature grouping for efficient food image recognition","authors":"Qingshuo Sun ,&nbsp;Guorui Sheng ,&nbsp;Xiangyi Zhu ,&nbsp;Jingru Song ,&nbsp;Yongqiang Song ,&nbsp;Tao Yao ,&nbsp;Haiyang Wang ,&nbsp;Lili Wang","doi":"10.1016/j.patrec.2025.12.011","DOIUrl":"10.1016/j.patrec.2025.12.011","url":null,"abstract":"<div><div>Food image recognition based on deep learning plays a crucial role in the field of food computing. However, its high demand for computing resources limits its deployment on end devices and fails to effectively achieve intelligent diet and nutrition management. To address this issue, we aim to balance computational efficiency with recognition accuracy and propose a compact food image recognition model named Lightweight Inter-Group Food Recognition Net (LIFR-Net) that combines Convolutional Neural Network (CNN) and Vision Transformer (ViT). In LIFR-Net, a lightweight ViT module called Lightweight Inter-group Transformer (LIT) is designed, and a lightweight component named Feature Grouping Transformer is constructed, which can efficiently extract local and global features of food images and optimize the number of parameters and computational complexity. In addition, by shuffling and fusing irregularly grouped feature maps, the information exchange among channels is enhanced, and the recognition accuracy of the model is improved. Extensive experiments on three commonly used public food image recognition datasets, namely ETHZ Food–101, Vireo Food–172, and UEC Food–256, show that LIFR-Net achieves recognition accuracies of 90.49%, 91.04%, and 74.23% with lower numbers of parameters and computational amounts.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 22-28"},"PeriodicalIF":3.3,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards robust and reliable multi-modal 3D segmentation of multiple sclerosis lesions 对多发性硬化症病变进行稳健可靠的多模态三维分割
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-24 DOI: 10.1016/j.patrec.2025.12.008
Edoardo Coppola , Mattia Savardi , Alberto Signoroni
Accurate 3D segmentation of multiple sclerosis lesions is critical for clinical practice, yet existing approaches face key limitations: many models rely on 2D architectures or partial modality combinations, while others struggle to generalise across scanners and protocols. Although large-scale, multi-site training can improve robustness, its data demands are often prohibitive. To address these challenges, we propose a 3D multi-modal network that simultaneously processes T1-weighted, T2-weighted, and FLAIR scans, leveraging full cross-modal interactions and volumetric context to achieve state-of-the-art performance across four diverse public datasets. To tackle data scarcity, we quantify the minimal fine-tuning effort needed to adapt to individual unseen datasets and reformulate the few-shot learning paradigm at an “instance-per-dataset” level (rather than traditional “instance-per-class”), enabling the quantification of the minimal fine-tuning effort to adapt to multiple unseen sources simultaneously. Finally, we introduce Latent Distance Analysis, a novel label-free reliability estimation technique that anticipates potential distribution shifts and supports any form of test-time adaptation, thereby strengthening efficient robustness and physicians’ trust.
多发性硬化症病变的准确3D分割对临床实践至关重要,然而现有的方法面临着关键的局限性:许多模型依赖于2D架构或部分模态组合,而其他模型则难以在扫描仪和协议之间进行推广。尽管大规模、多站点训练可以提高鲁棒性,但其数据需求往往令人望而却步。为了应对这些挑战,我们提出了一个3D多模态网络,可以同时处理t1加权、t2加权和FLAIR扫描,利用完整的跨模态交互和体积背景,在四个不同的公共数据集上实现最先进的性能。为了解决数据稀缺问题,我们量化了适应单个看不见的数据集所需的最小微调努力,并在“每个数据集实例”级别(而不是传统的“每个类实例”)上重新制定了几次学习范式,从而使最小微调努力的量化能够同时适应多个看不见的源。最后,我们介绍了潜在距离分析,这是一种新的无标签可靠性估计技术,可以预测潜在的分布变化,并支持任何形式的测试时间适应,从而增强了有效的鲁棒性和医生的信任。
{"title":"Towards robust and reliable multi-modal 3D segmentation of multiple sclerosis lesions","authors":"Edoardo Coppola ,&nbsp;Mattia Savardi ,&nbsp;Alberto Signoroni","doi":"10.1016/j.patrec.2025.12.008","DOIUrl":"10.1016/j.patrec.2025.12.008","url":null,"abstract":"<div><div>Accurate 3D segmentation of multiple sclerosis lesions is critical for clinical practice, yet existing approaches face key limitations: many models rely on 2D architectures or partial modality combinations, while others struggle to generalise across scanners and protocols. Although large-scale, multi-site training can improve robustness, its data demands are often prohibitive. To address these challenges, we propose a 3D multi-modal network that simultaneously processes T1-weighted, T2-weighted, and FLAIR scans, leveraging full cross-modal interactions and volumetric context to achieve state-of-the-art performance across four diverse public datasets. To tackle data scarcity, we quantify the <em>minimal</em> fine-tuning effort needed to adapt to individual unseen datasets and reformulate the few-shot learning paradigm at an “instance-per-dataset” level (rather than traditional “instance-per-class”), enabling the quantification of the <em>minimal</em> fine-tuning effort to adapt to <em>multiple</em> unseen sources simultaneously. Finally, we introduce <em>Latent Distance Analysis</em>, a novel label-free reliability estimation technique that anticipates potential distribution shifts and supports any form of test-time adaptation, thereby strengthening efficient robustness and physicians’ trust.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 115-122"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio prompt driven reprogramming for diagnosing major depressive disorder 音频提示驱动的重编程诊断重度抑郁症
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-24 DOI: 10.1016/j.patrec.2025.12.007
Hyunseo Kim, Longbin Jin, Eun Yi Kim
Diagnosing depression is critical due to its profound impact on individuals and associated risks. Although deep learning techniques like convolutional neural networks and transformers have been employed to detect depression, they require large, labeled datasets and substantial computational resources, posing challenges in data-scarce environments. We introduce p-DREAM (Prompt-Driven Reprogramming Exploiting Audio Mapping), a novel and data-efficient model designed to diagnose depression from speech data alone. The p-DREAM combines two main strategies: data augmentation and model reprogramming. First, it utilizes audio-specific data augmentation techniques to generate a richer set of training examples. Next, it employs audio prompts to aid in domain adaptation. These prompts guide a frozen pre-trained transformer, which extracts meaningful features. Finally, these features are fed into a lightweight classifier for prediction. The p-DREAM outperforms traditional fine-tuning and linear probing methods, while requiring only a small number of trainable parameters. Evaluations on three benchmark datasets (DAIC-WoZ, E-DAIC, and AVEC 2014) demonstrate consistent improvements. In particular, p-DREAM achieves a leading macro F1 score of 0.7734 using only acoustic features. We further conducted ablation studies on prompt length, position, and initialization, confirming their importance in effective model adaptation. p-DREAM offers a practical and privacy-conscious approach for speech-based depression assessment in low-resource environments. To promote reproducibility and community adoption, we plan to release our codebase in compliance with the ethical protocols outlined in the AVEC challenges.
诊断抑郁症至关重要,因为它对个人和相关风险的影响深远。虽然卷积神经网络和变压器等深度学习技术已被用于检测抑郁症,但它们需要大量的标记数据集和大量的计算资源,这在数据稀缺的环境中构成了挑战。我们介绍了p-DREAM(利用音频映射的提示驱动重编程),这是一个新颖的数据高效模型,旨在仅从语音数据诊断抑郁症。p-DREAM结合了两种主要策略:数据增强和模型重编程。首先,它利用特定于音频的数据增强技术来生成更丰富的训练示例集。接下来,它使用音频提示来帮助域适应。这些提示引导冻结的预训练变压器,提取有意义的特征。最后,将这些特征输入到一个轻量级分类器中进行预测。p-DREAM优于传统的微调和线性探测方法,同时只需要少量可训练的参数。对三个基准数据集(DAIC-WoZ, E-DAIC和AVEC 2014)的评估显示出一致的改进。特别是,p-DREAM仅使用声学特征就实现了领先的宏观F1得分0.7734。我们进一步对提示长度、位置和初始化进行了消融研究,证实了它们在有效的模型适应中的重要性。p-DREAM为低资源环境下基于语音的抑郁评估提供了一种实用且注重隐私的方法。为了促进可重复性和社区采用,我们计划按照AVEC挑战中概述的道德协议发布代码库。
{"title":"Audio prompt driven reprogramming for diagnosing major depressive disorder","authors":"Hyunseo Kim,&nbsp;Longbin Jin,&nbsp;Eun Yi Kim","doi":"10.1016/j.patrec.2025.12.007","DOIUrl":"10.1016/j.patrec.2025.12.007","url":null,"abstract":"<div><div>Diagnosing depression is critical due to its profound impact on individuals and associated risks. Although deep learning techniques like convolutional neural networks and transformers have been employed to detect depression, they require large, labeled datasets and substantial computational resources, posing challenges in data-scarce environments. We introduce p-DREAM (Prompt-Driven Reprogramming Exploiting Audio Mapping), a novel and data-efficient model designed to diagnose depression from speech data alone. The p-DREAM combines two main strategies: data augmentation and model reprogramming. First, it utilizes audio-specific data augmentation techniques to generate a richer set of training examples. Next, it employs audio prompts to aid in domain adaptation. These prompts guide a frozen pre-trained transformer, which extracts meaningful features. Finally, these features are fed into a lightweight classifier for prediction. The p-DREAM outperforms traditional fine-tuning and linear probing methods, while requiring only a small number of trainable parameters. Evaluations on three benchmark datasets (DAIC-WoZ, E-DAIC, and AVEC 2014) demonstrate consistent improvements. In particular, p-DREAM achieves a leading macro F1 score of 0.7734 using only acoustic features. We further conducted ablation studies on prompt length, position, and initialization, confirming their importance in effective model adaptation. p-DREAM offers a practical and privacy-conscious approach for speech-based depression assessment in low-resource environments. To promote reproducibility and community adoption, we plan to release our codebase in compliance with the ethical protocols outlined in the AVEC challenges.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 1-8"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MUSIC: Multi-coil unified sparsity regularization using inter-slice correlation for arterial spin labeling MRI denoising MUSIC:动脉自旋标记MRI去噪中使用层间相关的多线圈统一稀疏化
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-24 DOI: 10.1016/j.patrec.2025.12.009
Hangfan Liu , Bo Li , Yiran Li , Manuel Taso , Dylan Tisdall , Yulin Chang , John A Detre , Ze Wang
Arterial spin labeling (ASL) perfusion MRI stands as the sole non-invasive method to quantify regional cerebral blood flow (CBF), a crucial physiological parameter. However, ASL MRI typically suffers from a relatively low signal-to-noise ratio. In this study, we introduce a novel ASL denoising approach termed Multi-coil Unified Sparsity regularization using Inter-slice Correlation (MUSIC). While MRI, including ASL data, is routinely captured using multi-channel coils, existing denoising techniques are tailored for coil-combined data, overlooking inherent multi-channel correlations. MUSIC capitalizes on the fact that multi-channel images are primarily distinguished by coil sensitivity weighting and random noise, resulting in an intrinsic low-rank structure within the stacked multi-channel data matrix. This low rankness can be further enhanced by grouping highly correlated slices. Our approach involves adapting regularization to each slice individually, forming potentially low-rank matrices by stacking vectorized slices selected from different channels based on their Euclidean distance from the current slice under processing. Matrix rank is then approximated using the logarithm-determinant of the covariance matrix. Importantly, MUSIC operates directly on complex data, eliminating the need for separating magnitude and phase or dividing real and imaginary data, thereby minimizing information loss. The degree of low-rank regularization is controlled by the estimated noise level, achieving a balance between noise reduction and texture preservation. Experimental validation on real-world imaging data demonstrates the efficacy of MUSIC in significantly enhancing ASL perfusion quality. By effectively suppressing noise while retaining essential textural information, MUSIC holds promise for improving the utility and accuracy of ASL perfusion MRI, thus advancing neuroimaging research and clinical diagnoses.
动脉自旋标记(ASL)灌注MRI是唯一一种量化区域脑血流量(CBF)这一重要生理参数的无创方法。然而,ASL MRI通常具有相对较低的信噪比。在这项研究中,我们介绍了一种新的ASL去噪方法,称为使用片间相关(MUSIC)的多线圈统一稀疏性正则化。虽然MRI(包括ASL数据)通常使用多通道线圈捕获,但现有的去噪技术是针对线圈组合数据量身定制的,忽略了固有的多通道相关性。MUSIC利用了多通道图像主要通过线圈灵敏度加权和随机噪声来区分的事实,从而在堆叠的多通道数据矩阵中产生固有的低秩结构。通过对高度相关的切片进行分组,可以进一步增强这种低等级。我们的方法包括对每个切片单独进行正则化,通过堆叠从不同通道中选择的矢量化切片,形成潜在的低秩矩阵,这些切片基于它们与正在处理的当前切片的欧几里得距离。然后使用协方差矩阵的对数行列式来近似矩阵秩。重要的是,MUSIC直接对复杂数据进行操作,不需要分离幅度和相位或区分实数据和虚数据,从而最大限度地减少信息损失。低秩正则化程度由估计的噪声水平控制,实现了噪声降低和纹理保持的平衡。真实影像数据的实验验证表明MUSIC可显著提高ASL灌注质量。通过有效地抑制噪声,同时保留基本的纹理信息,MUSIC有望提高ASL灌注MRI的实用性和准确性,从而推进神经影像学研究和临床诊断。
{"title":"MUSIC: Multi-coil unified sparsity regularization using inter-slice correlation for arterial spin labeling MRI denoising","authors":"Hangfan Liu ,&nbsp;Bo Li ,&nbsp;Yiran Li ,&nbsp;Manuel Taso ,&nbsp;Dylan Tisdall ,&nbsp;Yulin Chang ,&nbsp;John A Detre ,&nbsp;Ze Wang","doi":"10.1016/j.patrec.2025.12.009","DOIUrl":"10.1016/j.patrec.2025.12.009","url":null,"abstract":"<div><div>Arterial spin labeling (ASL) perfusion MRI stands as the sole non-invasive method to quantify regional cerebral blood flow (CBF), a crucial physiological parameter. However, ASL MRI typically suffers from a relatively low signal-to-noise ratio. In this study, we introduce a novel ASL denoising approach termed Multi-coil Unified Sparsity regularization using Inter-slice Correlation (MUSIC). While MRI, including ASL data, is routinely captured using multi-channel coils, existing denoising techniques are tailored for coil-combined data, overlooking inherent multi-channel correlations. MUSIC capitalizes on the fact that multi-channel images are primarily distinguished by coil sensitivity weighting and random noise, resulting in an intrinsic low-rank structure within the stacked multi-channel data matrix. This low rankness can be further enhanced by grouping highly correlated slices. Our approach involves adapting regularization to each slice individually, forming potentially low-rank matrices by stacking vectorized slices selected from different channels based on their Euclidean distance from the current slice under processing. Matrix rank is then approximated using the logarithm-determinant of the covariance matrix. Importantly, MUSIC operates directly on complex data, eliminating the need for separating magnitude and phase or dividing real and imaginary data, thereby minimizing information loss. The degree of low-rank regularization is controlled by the estimated noise level, achieving a balance between noise reduction and texture preservation. Experimental validation on real-world imaging data demonstrates the efficacy of MUSIC in significantly enhancing ASL perfusion quality. By effectively suppressing noise while retaining essential textural information, MUSIC holds promise for improving the utility and accuracy of ASL perfusion MRI, thus advancing neuroimaging research and clinical diagnoses.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"200 ","pages":"Pages 142-148"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1