首页 > 最新文献

Expert Systems with Applications最新文献

英文 中文
PDDNet: An end-to-end object detection framework for real-world plant leaf disease diagnosis PDDNet:用于真实世界植物叶片疾病诊断的端到端对象检测框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131294
Fenglei Yang , Weiyi Ma , Qiaochuan Chen , Yan Sun , Yuexing Han
Accurate detection of plant leaf diseases in complex agricultural fields remains a critical challenge, primarily stemming from cluttered natural backgrounds, multi-scale lesion variations (ranging from tiny spots to large patches), and subtle visual distinctions among disease classes. To address these issues, we present PDDNet, an end-to-end plant disease detection framework that integrates fine-grained lesion features with global contextual information via a cascade encoder-decoder architecture. In the encoder, an Enhanced Attention-based Multi-scale Aggregation (EAMA) module is developed to capture multi-scale lesion features through dual-branch spatial-channel attention fusion, enabling cross-layer interaction and contextual enhancement. The decoder incorporates a Prior-Guided Self-Attention (PGSA) mechanism, which merges positional encodings with IoU-based geometric priors to dynamically weight attention, prioritizing lesion boundaries and morphological structures. To resolve the inherent conflict between classification and localization tasks, a Multi-task Feature Decoupling Module (MFDM) is proposed to generate task-specific dynamic masks, explicitly segregating semantic features (for classification) and spatial features (for regression). Experimental results validate the superiority of PDDNet: it achieves 43.6% AP on the PlantDoc dataset (outperforming AlignDETR by 0.3%) and 81.6% AP on the Tomato Leaf Disease dataset (outperforming the state-of-the-art by 0.2%). With its high accuracy and cross-scenario robustness, PDDNet offers a practical solution for precision agriculture, facilitating automated field-level disease diagnosis and supporting data-driven crop protection strategies.
在复杂的农业领域,准确检测植物叶片疾病仍然是一个关键的挑战,主要源于杂乱的自然背景,多尺度病变变化(从微小的斑点到大斑块),以及疾病类别之间微妙的视觉差异。为了解决这些问题,我们提出了PDDNet,这是一个端到端的植物病害检测框架,通过级联编码器-解码器架构将细粒度病变特征与全局上下文信息集成在一起。在编码器中,开发了增强的基于注意力的多尺度聚合(EAMA)模块,通过双分支空间通道注意力融合捕获多尺度病变特征,实现跨层交互和上下文增强。该解码器采用了先验引导自注意(PGSA)机制,该机制将位置编码与基于iou的几何先验相结合,以动态加权注意力,优先考虑病变边界和形态结构。为了解决分类任务和定位任务之间固有的冲突,提出了一个多任务特征解耦模块(MFDM)来生成特定于任务的动态掩模,明确分离语义特征(用于分类)和空间特征(用于回归)。实验结果验证了PDDNet的优越性:它在PlantDoc数据集上实现了43.6%的AP(比AlignDETR高0.3%),在番茄叶病数据集上实现了81.6%的AP(比最先进的0.2%)。凭借其高精度和跨场景鲁棒性,PDDNet为精准农业提供了实用的解决方案,促进了自动化的田间级疾病诊断,并支持数据驱动的作物保护策略。
{"title":"PDDNet: An end-to-end object detection framework for real-world plant leaf disease diagnosis","authors":"Fenglei Yang ,&nbsp;Weiyi Ma ,&nbsp;Qiaochuan Chen ,&nbsp;Yan Sun ,&nbsp;Yuexing Han","doi":"10.1016/j.eswa.2026.131294","DOIUrl":"10.1016/j.eswa.2026.131294","url":null,"abstract":"<div><div>Accurate detection of plant leaf diseases in complex agricultural fields remains a critical challenge, primarily stemming from cluttered natural backgrounds, multi-scale lesion variations (ranging from tiny spots to large patches), and subtle visual distinctions among disease classes. To address these issues, we present PDDNet, an end-to-end plant disease detection framework that integrates fine-grained lesion features with global contextual information via a cascade encoder-decoder architecture. In the encoder, an Enhanced Attention-based Multi-scale Aggregation (EAMA) module is developed to capture multi-scale lesion features through dual-branch spatial-channel attention fusion, enabling cross-layer interaction and contextual enhancement. The decoder incorporates a Prior-Guided Self-Attention (PGSA) mechanism, which merges positional encodings with IoU-based geometric priors to dynamically weight attention, prioritizing lesion boundaries and morphological structures. To resolve the inherent conflict between classification and localization tasks, a Multi-task Feature Decoupling Module (MFDM) is proposed to generate task-specific dynamic masks, explicitly segregating semantic features (for classification) and spatial features (for regression). Experimental results validate the superiority of PDDNet: it achieves 43.6% AP on the PlantDoc dataset (outperforming AlignDETR by 0.3%) and 81.6% AP on the Tomato Leaf Disease dataset (outperforming the state-of-the-art by 0.2%). With its high accuracy and cross-scenario robustness, PDDNet offers a practical solution for precision agriculture, facilitating automated field-level disease diagnosis and supporting data-driven crop protection strategies.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131294"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interpretable evaluation framework for complex systems integrating network science and data analysis 集成网络科学和数据分析的复杂系统可解释评估框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131317
Zhaoqi Fan , Yuting Wang , Zhiqiang Cai , Zhen He , Shubin Si
Complex systems are increasingly prevalent across a wide range of sectors, including healthcare, industrial production, and financial risk management. However, existing evaluation methods often struggle to provide objective and interpretable evaluations when dealing with high-dimensional, diverse, and nonlinear systems. This paper proposes a novel Network Comprehensive Evaluation (NCE) method that integrates statistical modeling techniques with complex network theory, offering a new framework for the comprehensive evaluation of complex systems. The NCE method is grounded in well-established mathematical and information technologies, ensuring both theoretical rigor and interpretability throughout the evaluation process. Unlike traditional approaches that rely on manually defined evaluation indicators and systems, or emerging methods such as machine learning and deep learning that often lack integration with domain-specific knowledge, the NCE method enhances both the reliability and applicability of evaluation results. The method is demonstrated through case studies in diverse fields, including the diagnosis of hepatitis, the performance evaluation of aero-engines, and the risk evaluation of financial systems, illustrating its effectiveness and broad applicability across different complex systems.
复杂系统在包括医疗保健、工业生产和金融风险管理在内的广泛领域越来越普遍。然而,当处理高维、多样化和非线性系统时,现有的评估方法往往难以提供客观和可解释的评估。将统计建模技术与复杂网络理论相结合,提出了一种新的网络综合评价方法,为复杂系统的综合评价提供了一个新的框架。NCE方法以完善的数学和信息技术为基础,确保了整个评估过程中理论的严谨性和可解释性。与依赖于手动定义的评估指标和系统的传统方法,或机器学习和深度学习等新兴方法(通常缺乏与特定领域知识的集成)不同,NCE方法增强了评估结果的可靠性和适用性。通过不同领域的案例研究,包括肝炎诊断、航空发动机性能评估和金融系统风险评估,证明了该方法在不同复杂系统中的有效性和广泛适用性。
{"title":"An interpretable evaluation framework for complex systems integrating network science and data analysis","authors":"Zhaoqi Fan ,&nbsp;Yuting Wang ,&nbsp;Zhiqiang Cai ,&nbsp;Zhen He ,&nbsp;Shubin Si","doi":"10.1016/j.eswa.2026.131317","DOIUrl":"10.1016/j.eswa.2026.131317","url":null,"abstract":"<div><div>Complex systems are increasingly prevalent across a wide range of sectors, including healthcare, industrial production, and financial risk management. However, existing evaluation methods often struggle to provide objective and interpretable evaluations when dealing with high-dimensional, diverse, and nonlinear systems. This paper proposes a novel Network Comprehensive Evaluation (NCE) method that integrates statistical modeling techniques with complex network theory, offering a new framework for the comprehensive evaluation of complex systems. The NCE method is grounded in well-established mathematical and information technologies, ensuring both theoretical rigor and interpretability throughout the evaluation process. Unlike traditional approaches that rely on manually defined evaluation indicators and systems, or emerging methods such as machine learning and deep learning that often lack integration with domain-specific knowledge, the NCE method enhances both the reliability and applicability of evaluation results. The method is demonstrated through case studies in diverse fields, including the diagnosis of hepatitis, the performance evaluation of aero-engines, and the risk evaluation of financial systems, illustrating its effectiveness and broad applicability across different complex systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131317"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated self-Expanding neural network learning framework for heterogeneous devices 异构设备的联邦自扩展神经网络学习框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131199
Rong Xie , Zhong Chen , Weiguo Cao , Haosen Wang
Federated learning enables collaborative training without sharing raw data, while addressing growing privacy concerns. Real deployments face wide device heterogeneity that undermines both efficiency and accuracy in multi sensor information fusion. We present FSENNL, a federated framework with a self expanding neural network that adapts model capacity to each device. It adjusts capacity dynamically while leaving communication unchanged. A natural extension score combines Fisher information with device profiles to decide when and where to expand. An adaptive regularization term stabilizes newly added units and prevents over extension. To align structurally diverse models during aggregation, an adaptive pruning compensation step uses Optimal Brain Surgeon with lightweight compensation data to recover accuracy after alignment. Knowledge distillation with an asynchronous fusion protocol mitigates straggler effects from uneven training speeds. Decoupling update frequency through teacher and student roles supports timely aggregation and cross device knowledge transfer while preserving convergence. Experiments across heterogeneous settings show consistent accuracy with improved resource use, and demonstrate that the method scales to large federations. FSENNL provides a practical solution for multi sensor information fusion in federated systems, delivering scalable and efficient models under diverse computational constraints.
联邦学习可以在不共享原始数据的情况下进行协作训练,同时解决日益增长的隐私问题。实际部署面临广泛的设备异构性,这破坏了多传感器信息融合的效率和准确性。我们提出了FSENNL,这是一个具有自扩展神经网络的联邦框架,它可以适应每个设备的模型容量。它在保持通信不变的情况下动态调整容量。自然扩展评分将Fisher信息与设备配置文件相结合,以决定何时何地扩展。自适应正则化项稳定了新增加的单元,防止了过度扩展。为了在聚合过程中对齐结构多样的模型,自适应修剪补偿步骤使用了具有轻量级补偿数据的Optimal Brain Surgeon来恢复对齐后的准确性。知识蒸馏与异步融合协议减轻了不均匀训练速度的掉队效应。通过教师和学生角色解耦更新频率支持及时聚合和跨设备知识转移,同时保持收敛性。跨异构设置的实验表明,改进的资源使用具有一致的准确性,并证明该方法适用于大型联合。FSENNL为联邦系统中的多传感器信息融合提供了一个实用的解决方案,在不同的计算约束下提供可扩展和高效的模型。
{"title":"Federated self-Expanding neural network learning framework for heterogeneous devices","authors":"Rong Xie ,&nbsp;Zhong Chen ,&nbsp;Weiguo Cao ,&nbsp;Haosen Wang","doi":"10.1016/j.eswa.2026.131199","DOIUrl":"10.1016/j.eswa.2026.131199","url":null,"abstract":"<div><div>Federated learning enables collaborative training without sharing raw data, while addressing growing privacy concerns. Real deployments face wide device heterogeneity that undermines both efficiency and accuracy in multi sensor information fusion. We present FSENNL, a federated framework with a self expanding neural network that adapts model capacity to each device. It adjusts capacity dynamically while leaving communication unchanged. A natural extension score combines Fisher information with device profiles to decide when and where to expand. An adaptive regularization term stabilizes newly added units and prevents over extension. To align structurally diverse models during aggregation, an adaptive pruning compensation step uses Optimal Brain Surgeon with lightweight compensation data to recover accuracy after alignment. Knowledge distillation with an asynchronous fusion protocol mitigates straggler effects from uneven training speeds. Decoupling update frequency through teacher and student roles supports timely aggregation and cross device knowledge transfer while preserving convergence. Experiments across heterogeneous settings show consistent accuracy with improved resource use, and demonstrate that the method scales to large federations. FSENNL provides a practical solution for multi sensor information fusion in federated systems, delivering scalable and efficient models under diverse computational constraints.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131199"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust COVID-19 detection from cough sounds using deep neural decision tree and forest: A comprehensive cross-datasets evaluation 基于深度神经决策树和森林的咳嗽声鲁棒COVID-19检测:综合交叉数据集评估
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131235
Rofiqul Islam , Nihad Karim Chowdhury , Muhammad Ashad Kabir
This research presents a robust approach to classifying COVID-19 cough sounds using cutting-edge machine learning techniques. Leveraging deep neural decision trees and deep neural decision forests, our methodology demonstrates consistent performance across diverse cough sound datasets. We begin with a comprehensive extraction of features to capture a wide range of audio features from individuals, whether COVID-19 positive or negative. To determine the most important features, we use recursive feature elimination along with cross-validation. Bayesian optimization fine-tunes hyper-parameters of deep neural decision tree and deep neural decision forest models. Additionally, we integrate the synthetic minority over-sampling technique during training to ensure a balanced representation of positive and negative data. Model performance refinement is achieved through threshold optimization, maximizing the ROC-AUC score. Our approach undergoes a comprehensive evaluation in five datasets: Cambridge (asymptomatic and symptomatic), Coswara, COUGHVID, Virufy, and the combined Virufy with the NoCoCoDa dataset. Consistently outperforming state-of-the-art methods, our proposed approach yields notable AUC scores of 0.97, 0.98, 0.92, 0.93, 0.99, and 0.99, alongside remarkable precision scores of 1, 1, 0.72, 0.93, 1, and 1 across the respective datasets. Merging all datasets into a combined dataset, our method, using a deep neural decision forest classifier, achieves an accuracy of 0.97, AUC of 0.97, precision of 0.95, recall of 0.96, F1-score of 0.96, and specificity score of 0.97. Also, our study includes a comprehensive cross-datasets analysis, revealing demographic and geographic differences in the cough sounds associated with COVID-19. These differences highlight the challenges in transferring learned features across diverse datasets and underscore the potential benefits of dataset integration, improving generalizability and enhancing COVID-19 detection from audio signals. The code used to generate the reported results is available at https://github.com/Rofiquldk1/COVID-19-Detection-from-Cough-Sound
这项研究提出了一种使用尖端机器学习技术对COVID-19咳嗽声音进行分类的强大方法。利用深度神经决策树和深度神经决策森林,我们的方法在不同的咳嗽声数据集上表现出一致的性能。我们从全面提取特征开始,以捕获个人的各种音频特征,无论是COVID-19阳性还是阴性。为了确定最重要的特征,我们使用递归特征消除和交叉验证。贝叶斯优化对深度神经决策树和深度神经决策森林模型的超参数进行微调。此外,我们在训练过程中集成了合成少数派过采样技术,以确保正数据和负数据的平衡表示。通过阈值优化实现模型性能细化,使ROC-AUC分数最大化。我们的方法在五个数据集中进行了全面的评估:Cambridge(无症状和有症状)、Coswara、COUGHVID、Virufy以及Virufy与NoCoCoDa联合数据集。我们提出的方法始终优于最先进的方法,其AUC得分为0.97、0.98、0.92、0.93、0.99和0.99,在各自的数据集上,精度得分分别为1、1、0.72、0.93、1和1。将所有数据集合并为一个组合数据集,我们的方法使用深度神经决策森林分类器,准确率为0.97,AUC为0.97,精密度为0.95,召回率为0.96,f1评分为0.96,特异性评分为0.97。此外,我们的研究还包括全面的交叉数据集分析,揭示了与COVID-19相关的咳嗽声的人口统计学和地理差异。这些差异凸显了在不同数据集之间传输学习特征所面临的挑战,并强调了数据集集成、提高通用性和增强音频信号中COVID-19检测的潜在好处。用于生成报告结果的代码可从https://github.com/Rofiquldk1/COVID-19-Detection-from-Cough-Sound获得
{"title":"Robust COVID-19 detection from cough sounds using deep neural decision tree and forest: A comprehensive cross-datasets evaluation","authors":"Rofiqul Islam ,&nbsp;Nihad Karim Chowdhury ,&nbsp;Muhammad Ashad Kabir","doi":"10.1016/j.eswa.2026.131235","DOIUrl":"10.1016/j.eswa.2026.131235","url":null,"abstract":"<div><div>This research presents a robust approach to classifying COVID-19 cough sounds using cutting-edge machine learning techniques. Leveraging deep neural decision trees and deep neural decision forests, our methodology demonstrates consistent performance across diverse cough sound datasets. We begin with a comprehensive extraction of features to capture a wide range of audio features from individuals, whether COVID-19 positive or negative. To determine the most important features, we use recursive feature elimination along with cross-validation. Bayesian optimization fine-tunes hyper-parameters of deep neural decision tree and deep neural decision forest models. Additionally, we integrate the synthetic minority over-sampling technique during training to ensure a balanced representation of positive and negative data. Model performance refinement is achieved through threshold optimization, maximizing the ROC-AUC score. Our approach undergoes a comprehensive evaluation in five datasets: Cambridge (asymptomatic and symptomatic), Coswara, COUGHVID, Virufy, and the combined Virufy with the NoCoCoDa dataset. Consistently outperforming state-of-the-art methods, our proposed approach yields notable AUC scores of 0.97, 0.98, 0.92, 0.93, 0.99, and 0.99, alongside remarkable precision scores of 1, 1, 0.72, 0.93, 1, and 1 across the respective datasets. Merging all datasets into a combined dataset, our method, using a deep neural decision forest classifier, achieves an accuracy of 0.97, AUC of 0.97, precision of 0.95, recall of 0.96, F1-score of 0.96, and specificity score of 0.97. Also, our study includes a comprehensive cross-datasets analysis, revealing demographic and geographic differences in the cough sounds associated with COVID-19. These differences highlight the challenges in transferring learned features across diverse datasets and underscore the potential benefits of dataset integration, improving generalizability and enhancing COVID-19 detection from audio signals. The code used to generate the reported results is available at <span><span>https://github.com/Rofiquldk1/COVID-19-Detection-from-Cough-Sound</span><svg><path></path></svg></span></div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131235"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-guided neural surrogate model for hydraulic cylinder condition assessment and clearance prediction 液压缸状态评估与间隙预测的物理导向神经代理模型
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131262
Donglai Li , Jianying Li , Xiaoyan Du , Jiafu Li , Tiefeng Li
The radial clearance of a hydraulic cylinder sealing pair is critical to system performance, and its prediction accuracy affects operational reliability. To address the inherent limitations of traditional measurement approaches and the limited interpretability of data driven models, this study develops a Mean-Flow guided neural network for quantitative prediction of clearance. Based on the kinetic energy balance in a control volume, an energy partition function is derived. Criteria for the critical pressure drop and critical clearance are further established, theoretically elucidating the energy conversion mechanism in the flow field. The network adopts a dual-branch structure that fuses mechanism features with simulation data. A physical consistency constraint is added to the loss function to keep predictions aligned with the theoretical energy partition behavior. The results show that the model follows the theoretical trend, with a mean relative error of 2.3 % and a 95 % limit of agreement of ±0.17 mm. This confirms the value of incorporating fluid mechanism constraints into the network and forms a framework for interpretable and accurate clearance prediction in hydraulic systems.
液压缸密封副的径向间隙对系统性能至关重要,其预测精度直接影响系统运行的可靠性。为了解决传统测量方法的固有局限性和数据驱动模型的有限可解释性,本研究开发了一个Mean-Flow导向的神经网络,用于定量预测间隙。基于控制体的动能平衡,导出了能量配分函数。进一步建立了临界压降和临界间隙的判据,从理论上阐明了流场中的能量转换机理。该网络采用双分支结构,融合了机理特征和仿真数据。物理一致性约束被添加到损失函数中,以保持预测与理论能量分配行为一致。结果表明,该模型符合理论趋势,平均相对误差为2.3%,95%的一致性限为±0.17 mm。这证实了将流体机制约束纳入网络的价值,并为液压系统中可解释和准确的间隙预测形成了框架。
{"title":"Physics-guided neural surrogate model for hydraulic cylinder condition assessment and clearance prediction","authors":"Donglai Li ,&nbsp;Jianying Li ,&nbsp;Xiaoyan Du ,&nbsp;Jiafu Li ,&nbsp;Tiefeng Li","doi":"10.1016/j.eswa.2026.131262","DOIUrl":"10.1016/j.eswa.2026.131262","url":null,"abstract":"<div><div>The radial clearance of a hydraulic cylinder sealing pair is critical to system performance, and its prediction accuracy affects operational reliability. To address the inherent limitations of traditional measurement approaches and the limited interpretability of data driven models, this study develops a Mean-Flow guided neural network for quantitative prediction of clearance. Based on the kinetic energy balance in a control volume, an energy partition function is derived. Criteria for the critical pressure drop and critical clearance are further established, theoretically elucidating the energy conversion mechanism in the flow field. The network adopts a dual-branch structure that fuses mechanism features with simulation data. A physical consistency constraint is added to the loss function to keep predictions aligned with the theoretical energy partition behavior. The results show that the model follows the theoretical trend, with a mean relative error of 2.3 % and a 95 % limit of agreement of ±0.17 mm. This confirms the value of incorporating fluid mechanism constraints into the network and forms a framework for interpretable and accurate clearance prediction in hydraulic systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131262"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid adaptive neighborhood multi-objective differential evolution algorithm improved by DBSCAN for hot rolling scheduling 基于DBSCAN改进的热轧调度混合自适应邻域多目标差分进化算法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131167
Wenchao Deng , Guodong Zhao , Qingxin Guo , Yun Dong
This paper investigates a multi-objective optimization problem for hot rolling scheduling in the steel industry, aiming at coordinating the scheduling of stock slabs and continuous casting slabs, while considering operational objectives triggered by waiting times and total changeover costs. Specifically, a multi-objective optimization algorithm is proposed that combines evolutionary algorithms with machine learning to simultaneously minimize the total changeover costs of slabs and the waiting times of continuously cast slabs. The algorithm employs multi-objective differential evolution (MODE) in the evolutionary algorithm section, which designs adaptive neighborhoods, parameter tuning and strategies for improving the solution quality and effectively avoid falling into local optimum. The machine learning part then introduces clustering algorithms to effectively guide the evolutionary direction, accelerate convergence, improve diversity, and optimize the search path. Numerical experiments show that the proposed algorithm has a significant advantage compared to other multi-objective evolutionary algorithms in terms of solution space exploration capability.
本文研究了钢铁行业热轧调度的多目标优化问题,在考虑等待时间和总转换成本触发的操作目标的情况下,以协调原坯和连铸坯的调度为目标。具体而言,提出了一种将进化算法与机器学习相结合的多目标优化算法,以同时最小化连铸板坯的总换板成本和等待时间。该算法在进化算法部分采用多目标差分进化(MODE),设计自适应邻域、参数调整和策略,提高解的质量,有效避免陷入局部最优。然后,机器学习部分引入聚类算法,有效引导进化方向,加速收敛,提高多样性,优化搜索路径。数值实验表明,与其他多目标进化算法相比,该算法在解空间探索能力方面具有显著优势。
{"title":"Hybrid adaptive neighborhood multi-objective differential evolution algorithm improved by DBSCAN for hot rolling scheduling","authors":"Wenchao Deng ,&nbsp;Guodong Zhao ,&nbsp;Qingxin Guo ,&nbsp;Yun Dong","doi":"10.1016/j.eswa.2026.131167","DOIUrl":"10.1016/j.eswa.2026.131167","url":null,"abstract":"<div><div>This paper investigates a multi-objective optimization problem for hot rolling scheduling in the steel industry, aiming at coordinating the scheduling of stock slabs and continuous casting slabs, while considering operational objectives triggered by waiting times and total changeover costs. Specifically, a multi-objective optimization algorithm is proposed that combines evolutionary algorithms with machine learning to simultaneously minimize the total changeover costs of slabs and the waiting times of continuously cast slabs. The algorithm employs multi-objective differential evolution (MODE) in the evolutionary algorithm section, which designs adaptive neighborhoods, parameter tuning and strategies for improving the solution quality and effectively avoid falling into local optimum. The machine learning part then introduces clustering algorithms to effectively guide the evolutionary direction, accelerate convergence, improve diversity, and optimize the search path. Numerical experiments show that the proposed algorithm has a significant advantage compared to other multi-objective evolutionary algorithms in terms of solution space exploration capability.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"309 ","pages":"Article 131167"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SegEdit: image editing via semantic mask segmentation and shape injection within diffusion model SegEdit:在扩散模型中通过语义掩码分割和形状注入对图像进行编辑
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131222
Ruikang Liu , Qingwen Xie , Nan Jiang , Zha Cheng , Zhaohui Yuan , Xiaohui Huang , Mengfei Duan
Diffusion-based image editing technology has become a research hotspot, utilizing textual semantics and conditional masks as constraints to achieve personalized image editing. However, existing solutions predominantly rely on manual annotation or offline processing for mask segmentation, severely limiting editing efficiency. Furthermore, existing mask constraints are primarily used to limit the editing area of the image, and are unable to control the shape of the generated object. To address these challenges, a zero-shot, training-free personalized image editing method called SegEdit is proposed. It effectively integrates mask segmentation and image editing capabilities within diffusion models through a novel noise replacement-based attention extraction method, a hybrid attention mechanism, and shape control strategy. The noise-replacement-based image inversion strategy quickly extracts approximately optimal solutions for the attention maps at each time step of the denoising process using noise replacement, greatly improving computational efficiency. The hybrid attention mechanism integrates the semantic features from cross-attention maps and the spatial detail features from self-attention maps to enhance the mask segmentation. The shape control mechanism combines mask appearance constraints with self-attention injection strategies to ensure that the generated target appearance is consistent with the masked appearance. Extensive validation was conducted on three datasets. The results demonstrate that SegEdit achieves optimal performance in terms of visual quality, mask segmentation and background preservation, while also exhibiting improved inference speed.
基于扩散的图像编辑技术,利用文本语义和条件掩码作为约束,实现个性化图像编辑,已成为研究热点。然而,现有的解决方案主要依赖于手动标注或离线处理掩码分割,严重限制了编辑效率。此外,现有的遮罩约束主要用于限制图像的编辑区域,无法控制生成对象的形状。为了解决这些挑战,提出了一种零射击,无需训练的个性化图像编辑方法,称为SegEdit。它通过一种新颖的基于噪声替换的注意力提取方法、混合注意力机制和形状控制策略,有效地将掩模分割和图像编辑功能集成到扩散模型中。基于噪声替代的图像反演策略在噪声替代去噪过程的每个时间步快速提取注意力图的近似最优解,大大提高了计算效率。混合注意机制融合了交叉注意图的语义特征和自注意图的空间细节特征,增强了掩码分割。形状控制机制将遮罩外观约束与自注意注入策略相结合,保证生成的目标外观与遮罩外观一致。在三个数据集上进行了广泛的验证。结果表明,SegEdit在视觉质量、掩码分割和背景保存方面都达到了最佳性能,同时推理速度也有所提高。
{"title":"SegEdit: image editing via semantic mask segmentation and shape injection within diffusion model","authors":"Ruikang Liu ,&nbsp;Qingwen Xie ,&nbsp;Nan Jiang ,&nbsp;Zha Cheng ,&nbsp;Zhaohui Yuan ,&nbsp;Xiaohui Huang ,&nbsp;Mengfei Duan","doi":"10.1016/j.eswa.2026.131222","DOIUrl":"10.1016/j.eswa.2026.131222","url":null,"abstract":"<div><div>Diffusion-based image editing technology has become a research hotspot, utilizing textual semantics and conditional masks as constraints to achieve personalized image editing. However, existing solutions predominantly rely on manual annotation or offline processing for mask segmentation, severely limiting editing efficiency. Furthermore, existing mask constraints are primarily used to limit the editing area of the image, and are unable to control the shape of the generated object. To address these challenges, a zero-shot, training-free personalized image editing method called SegEdit is proposed. It effectively integrates mask segmentation and image editing capabilities within diffusion models through a novel noise replacement-based attention extraction method, a hybrid attention mechanism, and shape control strategy. The noise-replacement-based image inversion strategy quickly extracts approximately optimal solutions for the attention maps at each time step of the denoising process using noise replacement, greatly improving computational efficiency. The hybrid attention mechanism integrates the semantic features from cross-attention maps and the spatial detail features from self-attention maps to enhance the mask segmentation. The shape control mechanism combines mask appearance constraints with self-attention injection strategies to ensure that the generated target appearance is consistent with the masked appearance. Extensive validation was conducted on three datasets. The results demonstrate that SegEdit achieves optimal performance in terms of visual quality, mask segmentation and background preservation, while also exhibiting improved inference speed.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131222"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight image super resolution method inspired by memory consolidation mechanism 受记忆巩固机制启发的轻量级图像超分辨率方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131293
Liangliang Chen , Peng Wang , Yuze Wang , He Jiang , Deqiang Cheng , Qiqi Kou
Single Image Super Resolution (SISR) aims to reconstruct High Resolution (HR) images from Low Resolution (LR) inputs. While deep learning methods have achieved significant progress, increasing network depth often leads to the attenuation of high-frequency details and the forgetting of earlier features. Inspired by the concept of memory consolidation, we propose Memory Consolidation network for Super Resolution (MCSR). The core of MCSR is the Reactivation Feature Architecture (RFA), which operates at two hierarchical levels: at the global level, RFA organizes Memory Reactivation Blocks (MRBs) in a reverse, layer-by-layer manner for network-wide feature consolidation; at the local level within each MRB, RFA similarly organizes cascaded Memory Activation Blocks (MABs) to create bidirectional processing pathways, using deep features with broader receptive fields to progressively reactivate latent shallow details. Each MAB incorporates Memory Units (MU) and Memory Variability Enhanced Attention (MVEA) to selectively emphasize and stabilize high-frequency features through variability-driven modulation. Furthermore, the Multi-source Memories Integration Attention (MMIA) module adaptively fuses features from different network stages into a coherent reconstruction signal. Extensive Experiments show MCSR outperforms state-of-the-art methods in detail reconstruction. Specifically, it exceeds HSRNet by 0.32 dB in terms of Peak Signal-to-Noise Ratio (PSNR) on Urban100 × 4 while reducing parameters and FLOPs by 34.4% and 22.3%, respectively.
单图像超分辨率(SISR)旨在从低分辨率(LR)输入重建高分辨率(HR)图像。虽然深度学习方法已经取得了重大进展,但增加网络深度往往会导致高频细节的衰减和早期特征的遗忘。受记忆巩固概念的启发,我们提出了超分辨率记忆巩固网络(MCSR)。MCSR的核心是再激活特征架构(RFA),它在两个层次上运行:在全局层面,RFA以一种反向的、逐层的方式组织内存再激活块(mrb),用于网络范围的特征整合;在每个MRB的局部水平上,RFA类似地组织级联记忆激活块(MABs)来创建双向处理路径,使用具有更广泛接受野的深层特征逐步重新激活潜在的浅层细节。每个MAB结合了记忆单元(MU)和记忆可变性增强注意(MVEA),通过可变性驱动的调制选择性地强调和稳定高频特征。此外,多源记忆集成注意(MMIA)模块自适应地将不同网络阶段的特征融合成一个相干重建信号。大量的实验表明,MCSR在细节重建方面优于最先进的方法。具体来说,它在Urban100 × 4上的峰值信噪比(PSNR)比HSRNet高出0.32 dB,参数和FLOPs分别降低了34.4%和22.3%。
{"title":"Lightweight image super resolution method inspired by memory consolidation mechanism","authors":"Liangliang Chen ,&nbsp;Peng Wang ,&nbsp;Yuze Wang ,&nbsp;He Jiang ,&nbsp;Deqiang Cheng ,&nbsp;Qiqi Kou","doi":"10.1016/j.eswa.2026.131293","DOIUrl":"10.1016/j.eswa.2026.131293","url":null,"abstract":"<div><div>Single Image Super Resolution (SISR) aims to reconstruct High Resolution (HR) images from Low Resolution (LR) inputs. While deep learning methods have achieved significant progress, increasing network depth often leads to the attenuation of high-frequency details and the forgetting of earlier features. Inspired by the concept of memory consolidation, we propose Memory Consolidation network for Super Resolution (MCSR). The core of MCSR is the Reactivation Feature Architecture (RFA), which operates at two hierarchical levels: at the global level, RFA organizes Memory Reactivation Blocks (MRBs) in a reverse, layer-by-layer manner for network-wide feature consolidation; at the local level within each MRB, RFA similarly organizes cascaded Memory Activation Blocks (MABs) to create bidirectional processing pathways, using deep features with broader receptive fields to progressively reactivate latent shallow details. Each MAB incorporates Memory Units (MU) and Memory Variability Enhanced Attention (MVEA) to selectively emphasize and stabilize high-frequency features through variability-driven modulation. Furthermore, the Multi-source Memories Integration Attention (MMIA) module adaptively fuses features from different network stages into a coherent reconstruction signal. Extensive Experiments show MCSR outperforms state-of-the-art methods in detail reconstruction. Specifically, it exceeds HSRNet by 0.32 dB in terms of Peak Signal-to-Noise Ratio (PSNR) on Urban100 × 4 while reducing parameters and FLOPs by 34.4% and 22.3%, respectively.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131293"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Universum graph-embedded kernel-based weighted extreme learning machine for handling class imbalance Universum图嵌入核加权极值学习机处理类不平衡
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131243
Bhagat Singh Raghuwanshi
In imbalanced classification problems, classifying the minority class has been widely measured within the machine learning community. Standard Extreme Learning Machine (ELM) classifiers are inherently biased toward the majority class, due to skewed class distributions. To mitigate this limitation, several ELM-based variants have been proposed, including Kernelized Weighted ELM (KWELM) and Graph-Embedded Kernelized Weighted ELM (GEKWELM). Among these, Kernelized ELM (KELM) demonstrates superior generalization performance compared with conventional ELM by leveraging kernel mapping. Universum learning provide valuable information about the underlying data distribution and decision boundaries, making them particularly effective for imbalanced learning scenarios. Motivated by these advantages, this paper proposes a novel hybrid framework termed Universum-based Graph-Embedded Kernelized Weighted Extreme Learning Machine (UGEKWELM). For the first time, UGEKWELM integrates Universum learning with GEKWELM to jointly exploit prior distributional knowledge and intrinsic geometric structure. Universum samples are training samples from the same domain that do not belong to any of the target classes. The proposed model employs a graph embedding strategy to preserve local data manifold information, thereby enhancing robustness and generalization in highly imbalanced learning. The effectiveness of UGEKWELM is validated using real-world imbalanced datasets from the KEEL repository. Experimental results demonstrate that UGEKWELM consistently outperforms existing methods. Specifically, it improves the average G-mean by 0.86%, 0.94%, 8.44%, 2.10%, 5.58%, 12.27%, 1.11%, 1.79%, and 7.31% over GEKWELM, UKWELM, EasyEnsemble, VWELM, WKSMOTE, RUSBoost, CSKELM, KWELM, and KELM, respectively. Furthermore, UGEKWELM achieves higher average AUC values, surpassing the same competing methods by 0.75%, 1.08%, 3.40%, 3.10%, 3.20%, 8.18%, 2.04%, 2.24%, and 8.32%, respectively.
在不平衡分类问题中,在机器学习社区中,对少数类进行分类已经被广泛测量。由于倾斜的类分布,标准极限学习机(ELM)分类器固有地偏向于大多数类。为了减轻这一限制,提出了几种基于ELM的变体,包括核加权ELM (KWELM)和图嵌入核加权ELM (GEKWELM)。其中,kernel - ized ELM (KELM)利用核映射,与传统ELM相比,表现出更好的泛化性能。Universum学习提供了关于底层数据分布和决策边界的有价值的信息,使它们对不平衡的学习场景特别有效。基于这些优点,本文提出了一种新的混合框架,称为基于universum的图嵌入核加权极限学习机(UGEKWELM)。UGEKWELM首次将优兴学习与GEKWELM相结合,共同开发先验分布知识和内在几何结构。Universum样本是来自同一领域的训练样本,不属于任何目标类。该模型采用图嵌入策略来保留局部数据流形信息,从而增强了高度不平衡学习的鲁棒性和泛化性。UGEKWELM的有效性通过使用来自KEEL存储库的真实不平衡数据集进行验证。实验结果表明,UGEKWELM始终优于现有的方法。与GEKWELM、UKWELM、EasyEnsemble、VWELM、WKSMOTE、RUSBoost、CSKELM、KWELM和KELM相比,分别提高了0.86%、0.94%、8.44%、2.10%、5.58%、12.27%、1.11%、1.79%和7.31%的平均G-mean。此外,UGEKWELM的平均AUC值高于同类竞争方法,分别高出0.75%、1.08%、3.40%、3.10%、3.20%、8.18%、2.04%、2.24%和8.32%。
{"title":"Universum graph-embedded kernel-based weighted extreme learning machine for handling class imbalance","authors":"Bhagat Singh Raghuwanshi","doi":"10.1016/j.eswa.2026.131243","DOIUrl":"10.1016/j.eswa.2026.131243","url":null,"abstract":"<div><div>In imbalanced classification problems, classifying the minority class has been widely measured within the machine learning community. Standard Extreme Learning Machine (ELM) classifiers are inherently biased toward the majority class, due to skewed class distributions. To mitigate this limitation, several ELM-based variants have been proposed, including Kernelized Weighted ELM (KWELM) and Graph-Embedded Kernelized Weighted ELM (GEKWELM). Among these, Kernelized ELM (KELM) demonstrates superior generalization performance compared with conventional ELM by leveraging kernel mapping. Universum learning provide valuable information about the underlying data distribution and decision boundaries, making them particularly effective for imbalanced learning scenarios. Motivated by these advantages, this paper proposes a novel hybrid framework termed Universum-based Graph-Embedded Kernelized Weighted Extreme Learning Machine (UGEKWELM). For the first time, UGEKWELM integrates Universum learning with GEKWELM to jointly exploit prior distributional knowledge and intrinsic geometric structure. Universum samples are training samples from the same domain that do not belong to any of the target classes. The proposed model employs a graph embedding strategy to preserve local data manifold information, thereby enhancing robustness and generalization in highly imbalanced learning. The effectiveness of UGEKWELM is validated using real-world imbalanced datasets from the KEEL repository. Experimental results demonstrate that UGEKWELM consistently outperforms existing methods. Specifically, it improves the average G-mean by 0.86%, 0.94%, 8.44%, 2.10%, 5.58%, 12.27%, 1.11%, 1.79%, and 7.31% over GEKWELM, UKWELM, EasyEnsemble, VWELM, WKSMOTE, RUSBoost, CSKELM, KWELM, and KELM, respectively. Furthermore, UGEKWELM achieves higher average AUC values, surpassing the same competing methods by 0.75%, 1.08%, 3.40%, 3.10%, 3.20%, 8.18%, 2.04%, 2.24%, and 8.32%, respectively.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131243"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PD-count: Prompt-driven zero-shot object counting with dynamic frequency transformation PD-count:动态频率变换的提示驱动的零射物体计数
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.eswa.2026.131295
Kai Liu , Jun Sang , Cheng Qian , Peng Zhu , Fa Zhu , Xiaofeng Xia , David Camacho
Zero-shot Object Counting (ZSOC) uses text prompts to count object instances of an arbitrary specified class in a query image, overcoming the dependence of traditional methods on predefined classes and a large amount of annotated data, and has broad application prospects in the open world. However, existing ZSOC methods have significant deficiencies in the deep utilization of text prompts and effective fusion of multi-scale visual-semantic features and global context information, resulting in inadequate cross-modal semantic alignment. To address these issues, we propose PD-Count, a prompt-driven ZSOC framework with dynamic frequency transformation, which is built on Contrastive Language-Image Pre-training (CLIP). First, weighted semantic prompt tuning injects the text prompt into different encoding layers of CLIP with learnable weights, and then fuses the weights with the visual features extracted by DINOV2 encoder to generate visual-semantic features that focus on the target area. Second, the learnable dynamic frequency transformation module reconstructs the structure-invariant features of visual semantics and uses a multi-layer perceptron to enhance the representation of global information. Subsequently, the visual-semantic embedding information enhanced in the frequency domain is concatenated with the intermediate similarity map channel-wise to obtain the counting-tailored text-image similarity map. Finally, to promote adaptive perception of object instances across various scales, we design a prompt-driven feature enhancement module that effectively captures the cross-level multi-scale semantic features and global context information. This ultimately boosts the generalization ability of the counting decoder for unseen classes. PD-Count exhibits excellent performance on the object counting dataset FSC-147 and the cross-dataset benchmark CARPK.
零射击对象计数(Zero-shot Object Counting, ZSOC)利用文本提示对查询图像中任意指定类的对象实例进行计数,克服了传统方法对预定义类和大量标注数据的依赖,在开放世界中具有广阔的应用前景。然而,现有的ZSOC方法在文本提示的深度利用和多尺度视觉语义特征与全局上下文信息的有效融合方面存在明显不足,导致跨模态语义对齐不足。为了解决这些问题,我们提出了一种基于对比语言图像预训练(CLIP)的动态频率变换的提示驱动ZSOC框架PD-Count。首先,加权语义提示调优将具有可学习权值的文本提示注入到CLIP的不同编码层中,然后将权值与DINOV2编码器提取的视觉特征融合,生成聚焦目标区域的视觉语义特征。其次,可学习动态变频模块重构视觉语义的结构不变特征,并使用多层感知器增强全局信息的表示;随后,将在频域增强的视觉语义嵌入信息与中间相似度图进行通道级联,得到计数定制的文本-图像相似度图。最后,为了促进对象实例在不同尺度上的自适应感知,我们设计了一个提示驱动的特征增强模块,该模块有效地捕获了跨层次的多尺度语义特征和全局上下文信息。这最终提高了计数解码器对不可见类的泛化能力。PD-Count在对象计数数据集FSC-147和跨数据集基准CARPK上表现出优异的性能。
{"title":"PD-count: Prompt-driven zero-shot object counting with dynamic frequency transformation","authors":"Kai Liu ,&nbsp;Jun Sang ,&nbsp;Cheng Qian ,&nbsp;Peng Zhu ,&nbsp;Fa Zhu ,&nbsp;Xiaofeng Xia ,&nbsp;David Camacho","doi":"10.1016/j.eswa.2026.131295","DOIUrl":"10.1016/j.eswa.2026.131295","url":null,"abstract":"<div><div>Zero-shot Object Counting (ZSOC) uses text prompts to count object instances of an arbitrary specified class in a query image, overcoming the dependence of traditional methods on predefined classes and a large amount of annotated data, and has broad application prospects in the open world. However, existing ZSOC methods have significant deficiencies in the deep utilization of text prompts and effective fusion of multi-scale visual-semantic features and global context information, resulting in inadequate cross-modal semantic alignment. To address these issues, we propose PD-Count, a prompt-driven ZSOC framework with dynamic frequency transformation, which is built on Contrastive Language-Image Pre-training (CLIP). First, weighted semantic prompt tuning injects the text prompt into different encoding layers of CLIP with learnable weights, and then fuses the weights with the visual features extracted by DINOV2 encoder to generate visual-semantic features that focus on the target area. Second, the learnable dynamic frequency transformation module reconstructs the structure-invariant features of visual semantics and uses a multi-layer perceptron to enhance the representation of global information. Subsequently, the visual-semantic embedding information enhanced in the frequency domain is concatenated with the intermediate similarity map channel-wise to obtain the counting-tailored text-image similarity map. Finally, to promote adaptive perception of object instances across various scales, we design a prompt-driven feature enhancement module that effectively captures the cross-level multi-scale semantic features and global context information. This ultimately boosts the generalization ability of the counting decoder for unseen classes. PD-Count exhibits excellent performance on the object counting dataset FSC-147 and the cross-dataset benchmark CARPK.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131295"},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Expert Systems with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1