Pub Date : 2026-01-28DOI: 10.1016/j.eswa.2026.131370
Jun Yu , Shengzhao Li , Huijie Liu , Qi Liu , Chang Tan , Zhiyuan Cheng , Jinze Wu
Process-oriented evaluation of classroom instruction is vital for assessing student learning quality and teacher instructional effectiveness. In recent years, object detection-based methods have been widely applied to classroom behavior recognition, yet they struggle with the unique challenges of real-world classrooms: small student objects due to distant cameras, frequent occlusions, and subtle, fine-grained behaviors like “Gaze” and “Turn”. To address these issues, this paper proposes EduYOLO, a novel classroom Behavior Recognition Framework Based on High-Resolution Feature Attention Fusion (HRFAF) module, which is architected around three dedicated components: a Key Region Perception Backbone that enhances the representation of crucial action regions, a Fine-Grained Action Modeling Neck that captures intricate behavioral patterns, and a High-Resolution Prediction Head that significantly improves small object detection. This holistic design synergistically strengthens the capability of model to perceive local details and complex postures. Furthermore, we design the FM-IoU loss function for bounding box regression, integrating focal weighting and multi-point distance constraints to enhance localization stability. Extensive experiments conducted on the self-constructed CSCB-Dataset and SCB-Data3 demonstrate that the proposed EduYOLO achieves superior detection accuracy and generalization performance compared with existing methods, confirming its effectiveness and robustness for real-world classroom behavior recognition tasks. To support reproducible research, our code is available at: https://github.com/datadance/EduYolo.
{"title":"EduYOLO: A classroom behavior recognition framework based on high-resolution feature attention fusion","authors":"Jun Yu , Shengzhao Li , Huijie Liu , Qi Liu , Chang Tan , Zhiyuan Cheng , Jinze Wu","doi":"10.1016/j.eswa.2026.131370","DOIUrl":"10.1016/j.eswa.2026.131370","url":null,"abstract":"<div><div>Process-oriented evaluation of classroom instruction is vital for assessing student learning quality and teacher instructional effectiveness. In recent years, object detection-based methods have been widely applied to classroom behavior recognition, yet they struggle with the unique challenges of real-world classrooms: small student objects due to distant cameras, frequent occlusions, and subtle, fine-grained behaviors like “Gaze” and “Turn”. To address these issues, this paper proposes EduYOLO, a novel classroom Behavior Recognition Framework Based on High-Resolution Feature Attention Fusion (HRFAF) module, which is architected around three dedicated components: a Key Region Perception Backbone that enhances the representation of crucial action regions, a Fine-Grained Action Modeling Neck that captures intricate behavioral patterns, and a High-Resolution Prediction Head that significantly improves small object detection. This holistic design synergistically strengthens the capability of model to perceive local details and complex postures. Furthermore, we design the FM-IoU loss function for bounding box regression, integrating focal weighting and multi-point distance constraints to enhance localization stability. Extensive experiments conducted on the self-constructed CSCB-Dataset and SCB-Data3 demonstrate that the proposed EduYOLO achieves superior detection accuracy and generalization performance compared with existing methods, confirming its effectiveness and robustness for real-world classroom behavior recognition tasks. To support reproducible research, our code is available at: <span><span>https://github.com/datadance/EduYolo</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131370"},"PeriodicalIF":7.5,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-28DOI: 10.1016/j.eswa.2026.131360
mohammad ghasempour , yuan yuan , hadi amirpour , hongjie he , christian timmerer
With the ever-increasing amount of digital video content, efficient encryption is crucial to protect visual content across diverse platforms. Existing methods often incur excessive bitrate overhead due to content variability. Furthermore, since most videos are already compressed, encryption in the compressed domain is essential to avoid processing overhead and re-compression quality loss. However, achieving both format compliance and compression efficiency while ensuring that the decoded content remains unrecognizable is challenging in the compressed domain, since only limited information is available without full decoding. This paper proposes an adaptive compressed domain video encryption (ACDC) method that dynamically adjusts the encryption strategy according to content characteristics. Two tunable parameters derived from the bitstream information enable adaptation to various application requirements. An adaptive syntax integrity method is employed to produce format-compliant bitstreams without full decoding. Experimental results show that ACDC reduces bitrate overhead by 48.2% and achieves a 31-fold speedup in encryption time compared to the latest state of the art, while producing visually unrecognizable outputs.
{"title":"Adaptive compressed domain video encryption","authors":"mohammad ghasempour , yuan yuan , hadi amirpour , hongjie he , christian timmerer","doi":"10.1016/j.eswa.2026.131360","DOIUrl":"10.1016/j.eswa.2026.131360","url":null,"abstract":"<div><div>With the ever-increasing amount of digital video content, efficient encryption is crucial to protect visual content across diverse platforms. Existing methods often incur excessive bitrate overhead due to content variability. Furthermore, since most videos are already compressed, encryption in the compressed domain is essential to avoid processing overhead and re-compression quality loss. However, achieving both format compliance and compression efficiency while ensuring that the decoded content remains unrecognizable is challenging in the compressed domain, since only limited information is available without full decoding. This paper proposes an adaptive compressed domain video encryption (ACDC) method that dynamically adjusts the encryption strategy according to content characteristics. Two tunable parameters derived from the bitstream information enable adaptation to various application requirements. An adaptive syntax integrity method is employed to produce format-compliant bitstreams without full decoding. Experimental results show that ACDC reduces bitrate overhead by 48.2% and achieves a 31-fold speedup in encryption time compared to the latest state of the art, while producing visually unrecognizable outputs.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131360"},"PeriodicalIF":7.5,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131346
Runmin Wang , Xingdong Song , Zukun Wan , Han Xu , Congzhen Yu , Tianming Ma , Yajun Ding , Shengyou Qian
Visual Question Answering (VQA) evaluates the visual-textual reasoning capabilities of intelligent agents. However, existing methods are often susceptible to various biases. In particular, language bias leads models to rely on spurious question-answer correlations as shortcut solutions, while distribution bias caused by dataset imbalance encourages models to overfit head classes and overlook tail classes. To address these long-standing challenges, we propose a Dual-Space Intervention (DSI) approach that tackles these two biases from a unified yet complementary perspective. Two key innovations are included in our work: (1) In the input space, we adopt an adaptive question shuffling strategy to alleviate language bias by adjusting perturbation strength according to question bias, ensuring models develop a deeper understanding of the problem context, rather than relying on spurious word-answer correlations; (2) In the output space, we propose a novel label rebalancing mechanism that moderates head-class dominance based on long-tailed statistics, improving robustness to distribution bias. This approach reduces the disproportionately high variance in head logits relative to tail logits, improving tail class recognition accuracy. Extensive experiments on four benchmarks (VQA-CP v1, VQA-CP v2, VQA-CE, and SLAKE-CP) demonstrate our method’s superiority, with VQA-CP v1 and SLAKE-CP achieving state-of-the-art performance at 63.14% and 37.61% respectively. The code will be released at https://github.com/songxdr3/DSI.
{"title":"Dual-space intervention for mitigating bias in robust visual question answering","authors":"Runmin Wang , Xingdong Song , Zukun Wan , Han Xu , Congzhen Yu , Tianming Ma , Yajun Ding , Shengyou Qian","doi":"10.1016/j.eswa.2026.131346","DOIUrl":"10.1016/j.eswa.2026.131346","url":null,"abstract":"<div><div>Visual Question Answering (VQA) evaluates the visual-textual reasoning capabilities of intelligent agents. However, existing methods are often susceptible to various biases. In particular, language bias leads models to rely on spurious question-answer correlations as shortcut solutions, while distribution bias caused by dataset imbalance encourages models to overfit head classes and overlook tail classes. To address these long-standing challenges, we propose a Dual-Space Intervention (DSI) approach that tackles these two biases from a unified yet complementary perspective. Two key innovations are included in our work: (1) In the input space, we adopt an adaptive question shuffling strategy to alleviate language bias by adjusting perturbation strength according to question bias, ensuring models develop a deeper understanding of the problem context, rather than relying on spurious word-answer correlations; (2) In the output space, we propose a novel label rebalancing mechanism that moderates head-class dominance based on long-tailed statistics, improving robustness to distribution bias. This approach reduces the disproportionately high variance in head logits relative to tail logits, improving tail class recognition accuracy. Extensive experiments on four benchmarks (VQA-CP v1, VQA-CP v2, VQA-CE, and SLAKE-CP) demonstrate our method’s superiority, with VQA-CP v1 and SLAKE-CP achieving state-of-the-art performance at 63.14% and 37.61% respectively. The code will be released at <span><span>https://github.com/songxdr3/DSI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131346"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131345
Shuai Wang , Ruina Mao
Existing pre-trained foundation models have demonstrated strong generalization and transfer capabilities across diverse domains. However, directly fine-tuning all parameters of pre-trained models for medical image classification requires massive labeled data, making it inefficient and resource-intensive. To address this, we aim to leverage semi-supervised learning (SSL) techniques to reduce the need for massive annotations for efficient fine-tuning. In this context, we propose PromptMed, a parameter-efficient framework for semi-supervised medical image classification, which consists of three key components: Prompt Noise Injection (PNI), Class-Balanced Prompt Adaptation (CBPA), and Contrastive Feature Consistency (CFC). Specifically, we introduce PNI to enhance the robustness of prompt representations and enable effective prompt-based consistency training. PNI applies Gaussian noise of varying strengths to prompt tokens, serving as a form of representation-level augmentation. To mitigate class imbalance, we design a CBPA mechanism that dynamically assigns higher noise to minority classes based on recent class distributions, encouraging better representation learning for hard categories. Additionally, to promote feature consistency, especially for minority and visually similar classes, we incorporate a CFC on the vision branch features. These three components work synergistically to enable PromptMed to achieve robust, balanced, and highly discriminative medical image classification with significantly reduced trainable parameters. Extensive experiments on multiple medical image datasets demonstrate that our approach achieves state-of-the-art performance while significantly reducing the number of trainable parameters.
{"title":"PromptMed: Prompt-driven semi-supervised medical image classification with class-balanced consistency and contrastive learning","authors":"Shuai Wang , Ruina Mao","doi":"10.1016/j.eswa.2026.131345","DOIUrl":"10.1016/j.eswa.2026.131345","url":null,"abstract":"<div><div>Existing pre-trained foundation models have demonstrated strong generalization and transfer capabilities across diverse domains. However, directly fine-tuning all parameters of pre-trained models for medical image classification requires massive labeled data, making it inefficient and resource-intensive. To address this, we aim to leverage semi-supervised learning (SSL) techniques to reduce the need for massive annotations for efficient fine-tuning. In this context, we propose PromptMed, a parameter-efficient framework for semi-supervised medical image classification, which consists of three key components: Prompt Noise Injection (PNI), Class-Balanced Prompt Adaptation (CBPA), and Contrastive Feature Consistency (CFC). Specifically, we introduce PNI to enhance the robustness of prompt representations and enable effective prompt-based consistency training. PNI applies Gaussian noise of varying strengths to prompt tokens, serving as a form of representation-level augmentation. To mitigate class imbalance, we design a CBPA mechanism that dynamically assigns higher noise to minority classes based on recent class distributions, encouraging better representation learning for hard categories. Additionally, to promote feature consistency, especially for minority and visually similar classes, we incorporate a CFC on the vision branch features. These three components work synergistically to enable PromptMed to achieve robust, balanced, and highly discriminative medical image classification with significantly reduced trainable parameters. Extensive experiments on multiple medical image datasets demonstrate that our approach achieves state-of-the-art performance while significantly reducing the number of trainable parameters.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131345"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131339
Xi Liu , Jun Liu
Aerial Edge Computing has recently received significant research attention due to its remarkable potential for dynamically deploying computing power. We address the problem of service scheduling in aerial edge computing, in which uncrewed aerial vehicles (UAVs) are deployed to mission areas to provide sensor data collection and analysis services. Two types of sensing tasks are considered: single-zone service and multiple-zone service. The first category refers to UAVs that remain in a single zone. The second category refers to a UAV traversing several areas to collect sensing data to meet user requirements. The objective is to maximize the overall utility of the UAVs. The service scheduling problem is formulated as an ordinal potential game to achieve a stable system state. A distributed algorithm based on reinforcement learning is proposed. An improved search-state formulation is introduced to accelerate convergence and enhance search efficiency. The proposed scheduling algorithm is demonstrated to achieve a Nash equilibrium in where no UAV can improve its utility by unilaterally deviating. Additionally, the approximation performance of the proposed scheduling algorithm and the game’s price of anarchy are presented. The results indicate that the proposed algorithm provides higher utility to UAVs and adapts effectively to diverse distribution environments.
{"title":"Reinforcement learning-driven service allocation via potential game modeling in aerial edge computing","authors":"Xi Liu , Jun Liu","doi":"10.1016/j.eswa.2026.131339","DOIUrl":"10.1016/j.eswa.2026.131339","url":null,"abstract":"<div><div>Aerial Edge Computing has recently received significant research attention due to its remarkable potential for dynamically deploying computing power. We address the problem of service scheduling in aerial edge computing, in which uncrewed aerial vehicles (UAVs) are deployed to mission areas to provide sensor data collection and analysis services. Two types of sensing tasks are considered: single-zone service and multiple-zone service. The first category refers to UAVs that remain in a single zone. The second category refers to a UAV traversing several areas to collect sensing data to meet user requirements. The objective is to maximize the overall utility of the UAVs. The service scheduling problem is formulated as an ordinal potential game to achieve a stable system state. A distributed algorithm based on reinforcement learning is proposed. An improved search-state formulation is introduced to accelerate convergence and enhance search efficiency. The proposed scheduling algorithm is demonstrated to achieve a Nash equilibrium in where no UAV can improve its utility by unilaterally deviating. Additionally, the approximation performance of the proposed scheduling algorithm and the game’s price of anarchy are presented. The results indicate that the proposed algorithm provides higher utility to UAVs and adapts effectively to diverse distribution environments.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131339"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131225
Sebastian Zarębski , Krzysztof Rusek , Piotr Chołda
This paper introduces Linear Model of Latent Dirichlet Allocation (LMLDA), a novel methodology for software test optimization that directly addresses the gap between computationally-prohibitive large language models (LLMs) and semantically-shallow heuristics. Our primary contribution is a lightweight, interpretable, and cost-efficient model specifically designed for high-stakes industrial Continuous Integration and Continuous Development (CI/CD) environments where security and traceability are essential. The novelty of LMLDA lies in its integration of Latent Dirichlet Allocation (LDA) for the semantic analysis of code modifications and test content, with a classifier based on logistic regression concepts for the training phase, yet offering prediction capabilities that align with the computational simplicity of linear regression. This approach uniquely predicts the probability of test failure based on semantic interactions, enabling precise, bug-centric prioritization rather than relying on indirect diversity proxies. A large-scale industrial case study at NOKIA demonstrates LMLDA’s practical effectiveness, achieving an average 64% reduction in test suite size while maintaining 88% precision in bug detection and accelerating critical bug discovery by an average of 8 h per cycle.
{"title":"Regression test optimization for software of the cellular network base stations: A language-based approach","authors":"Sebastian Zarębski , Krzysztof Rusek , Piotr Chołda","doi":"10.1016/j.eswa.2026.131225","DOIUrl":"10.1016/j.eswa.2026.131225","url":null,"abstract":"<div><div>This paper introduces Linear Model of Latent Dirichlet Allocation (LMLDA), a novel methodology for software test optimization that directly addresses the gap between computationally-prohibitive large language models (LLMs) and semantically-shallow heuristics. Our primary contribution is a lightweight, interpretable, and cost-efficient model specifically designed for high-stakes industrial Continuous Integration and Continuous Development (CI/CD) environments where security and traceability are essential. The novelty of LMLDA lies in its integration of Latent Dirichlet Allocation (LDA) for the semantic analysis of code modifications and test content, with a classifier based on logistic regression concepts for the training phase, yet offering prediction capabilities that align with the computational simplicity of linear regression. This approach uniquely predicts the probability of test failure based on semantic interactions, enabling precise, bug-centric prioritization rather than relying on indirect diversity proxies. A large-scale industrial case study at NOKIA demonstrates LMLDA’s practical effectiveness, achieving an average 64% reduction in test suite size while maintaining 88% precision in bug detection and accelerating critical bug discovery by an average of 8 h per cycle.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131225"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131320
Erfan Amani Bani, Kourosh Eshghi
Mathematical modeling and the subsequent development of optimization algorithms for problems have been the core focus of operations research scientists. However, the challenges of solving complex models promptly have always sparked numerous innovations in this field. Quantum computing has been proposed as an alternative to binary computing for several decades. In recent years, operations researchers have paid special attention to applying and integrating this logic with optimization. Specifically, many quantum-based optimization algorithms have been developed; however, little attention has been given to modeling optimization problems using quantum variables. In this paper, a practical problem, the dynamic ride-sharing problem, is redefined and then modeled with the help of quantum variables. Based on quantum variables, the resulting model is fully compatible with quantum algorithms. Subsequently, quantum algorithms based on Benders’ decomposition have been developed. Despite the limitations of access to quantum computing hardware, from a theoretical perspective in terms of computational complexity and solving a simple example, the performance of the algorithms has been demonstrated.
{"title":"Quantum modeling of the dynamic ride-sharing problem: Development of quantum benders decomposition methods","authors":"Erfan Amani Bani, Kourosh Eshghi","doi":"10.1016/j.eswa.2026.131320","DOIUrl":"10.1016/j.eswa.2026.131320","url":null,"abstract":"<div><div>Mathematical modeling and the subsequent development of optimization algorithms for problems have been the core focus of operations research scientists. However, the challenges of solving complex models promptly have always sparked numerous innovations in this field. Quantum computing has been proposed as an alternative to binary computing for several decades. In recent years, operations researchers have paid special attention to applying and integrating this logic with optimization. Specifically, many quantum-based optimization algorithms have been developed; however, little attention has been given to modeling optimization problems using quantum variables. In this paper, a practical problem, the dynamic ride-sharing problem, is redefined and then modeled with the help of quantum variables. Based on quantum variables, the resulting model is fully compatible with quantum algorithms. Subsequently, quantum algorithms based on Benders’ decomposition have been developed. Despite the limitations of access to quantum computing hardware, from a theoretical perspective in terms of computational complexity and solving a simple example, the performance of the algorithms has been demonstrated.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131320"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131287
Yu Yang , Suxia Zhu , Guanglu Sun , Zian He , Xinyu Liu , Kai Zhou , Xiaojuan Cui
Federated learning (FL) enables privacy-preserving fine-tuning of multimodal large language models (MLLMs) on edge devices; however, the limited computational resources of edge clients, coupled with inherent modality and data heterogeneity across clients, pose major challenges for federated multimodal fine-tuning and lead to performance degradation. To tackle these issues, we propose DP-HM2F, a data-driven LoRA framework with a dual-projection representation mechanism for heterogeneous multimodal federated fine-tuning. Specifically, DP-HM2F establishes a dual-projection architecture that exploits a global feature pool and client-specific local feature pools, where the global pool encodes privacy-agnostic shared representations and each edge client dynamically maintains a local pool to refine heterogeneous multimodal representations. The architecture enables projection-based retrieval between the global and local pools to improve representation alignment, while introducing additional computational overhead on resource-constrained devices. To mitigate this limitation, DP-HM2F integrates a data-driven LoRA module that adaptively scales the number of trainable parameters based on local data, thereby alleviating computational constraints across heterogeneous clients. Furthermore, to address semantic conflicts induced by high-dimensional representation spaces during federated aggregation, we introduce a positive-vector collaborative optimization strategy to alleviate conflicting client updates. Extensive experimental results demonstrate that DP-HM2F, with only 7.05% of trainable parameters (a 0.3% reduction compared with conventional LoRA-based methods), achieves a performance improvement of 4.1 points under heterogeneous multimodal settings.
{"title":"DP-HM2F: Data-driven LoRA with dual-projection representation for heterogeneous multimodal federated fine-tuning","authors":"Yu Yang , Suxia Zhu , Guanglu Sun , Zian He , Xinyu Liu , Kai Zhou , Xiaojuan Cui","doi":"10.1016/j.eswa.2026.131287","DOIUrl":"10.1016/j.eswa.2026.131287","url":null,"abstract":"<div><div>Federated learning (FL) enables privacy-preserving fine-tuning of multimodal large language models (MLLMs) on edge devices; however, the limited computational resources of edge clients, coupled with inherent modality and data heterogeneity across clients, pose major challenges for federated multimodal fine-tuning and lead to performance degradation. To tackle these issues, we propose DP-HM2F, a data-driven LoRA framework with a dual-projection representation mechanism for heterogeneous multimodal federated fine-tuning. Specifically, DP-HM2F establishes a dual-projection architecture that exploits a global feature pool and client-specific local feature pools, where the global pool encodes privacy-agnostic shared representations and each edge client dynamically maintains a local pool to refine heterogeneous multimodal representations. The architecture enables projection-based retrieval between the global and local pools to improve representation alignment, while introducing additional computational overhead on resource-constrained devices. To mitigate this limitation, DP-HM2F integrates a data-driven LoRA module that adaptively scales the number of trainable parameters based on local data, thereby alleviating computational constraints across heterogeneous clients. Furthermore, to address semantic conflicts induced by high-dimensional representation spaces during federated aggregation, we introduce a positive-vector collaborative optimization strategy to alleviate conflicting client updates. Extensive experimental results demonstrate that DP-HM2F, with only 7.05% of trainable parameters (a 0.3% reduction compared with conventional LoRA-based methods), achieves a performance improvement of 4.1 points under heterogeneous multimodal settings.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131287"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131343
Shuangxue Liu , Hongbin Xie , Yuzhen Lei , Jiaxing Zhao , Xuan Song
Automotive chip production involves complex interdependencies across design, manufacturing, and supply-chain processes, posing significant challenges for interpretable and consistent reasoning. To address these challenges, this paper proposes a Causal-Knowledge Heterogeneous Graph (C-KHG) framework that integrates a domain knowledge graph with a text-grounded causal event graph, capturing linguistically asserted cause-and-effect relations extracted from expert-authored technical documents. Unlike statistical causal discovery or interventional causal modeling, the proposed causal event graph focuses on causally informed semantic reasoning, emphasizing directional consistency and interpretability aligned with domain expert knowledge. Built upon the unified heterogeneous graph, we design a three-stage reasoning pipeline consisting of intent classification, graph-based adaptive retrieval, and large language model (LLM) answer generation. To evaluate its effectiveness, experiments were conducted on three representative tasks: hybrid knowledge-causal reasoning, value-chain question answering, and pure causal reasoning. Specifically, we address three types of tasks: (1) hybrid knowledge-causal reasoning, which jointly involves entity-level knowledge retrieval and cause-and-effect analysis; (2) value-chain question answering, which focuses on structured domain knowledge across the automotive chip lifecycle; and (3) pure causal reasoning, which concentrates exclusively on cause-and-effect relations without requiring explicit entity attributes. Instead of relying on direct prompt-based inference, we construct the causal knowledge graph as an explicit intermediate structured layer, efficiently bootstrapped by LLMs, which serves as a persistent and updatable domain memory. This design improves reasoning stability and directional consistency while facilitating knowledge maintenance and iterative updates without model retraining. Experimental results on automotive chip value-chain question answering tasks demonstrate that the proposed framework consistently improves reasoning accuracy, causal directionality, and interpretability compared with vanilla LLMs and conventional knowledge-graph-based retrieval methods. In particular, for the causal-knowledge fusion task, the cosine similarity of GLM4-9B improved from 9.63 to 21.75. These findings highlight the effectiveness of structured graph-based reasoning scaffolds as intermediate representations for enhancing LLM-based reasoning in complex industrial domains. Code and data are made available on https://github.com/shuangxueliu/C-KHG.
{"title":"LLM-augmented causal-knowledge heterogeneous graph framework for interpretable reasoning and collaborative knowledge fusion in automotive chip production","authors":"Shuangxue Liu , Hongbin Xie , Yuzhen Lei , Jiaxing Zhao , Xuan Song","doi":"10.1016/j.eswa.2026.131343","DOIUrl":"10.1016/j.eswa.2026.131343","url":null,"abstract":"<div><div>Automotive chip production involves complex interdependencies across design, manufacturing, and supply-chain processes, posing significant challenges for interpretable and consistent reasoning. To address these challenges, this paper proposes a Causal-Knowledge Heterogeneous Graph (C-KHG) framework that integrates a domain knowledge graph with a text-grounded causal event graph, capturing linguistically asserted cause-and-effect relations extracted from expert-authored technical documents. Unlike statistical causal discovery or interventional causal modeling, the proposed causal event graph focuses on causally informed semantic reasoning, emphasizing directional consistency and interpretability aligned with domain expert knowledge. Built upon the unified heterogeneous graph, we design a three-stage reasoning pipeline consisting of intent classification, graph-based adaptive retrieval, and large language model (LLM) answer generation. To evaluate its effectiveness, experiments were conducted on three representative tasks: hybrid knowledge-causal reasoning, value-chain question answering, and pure causal reasoning. Specifically, we address three types of tasks: (1) hybrid knowledge-causal reasoning, which jointly involves entity-level knowledge retrieval and cause-and-effect analysis; (2) value-chain question answering, which focuses on structured domain knowledge across the automotive chip lifecycle; and (3) pure causal reasoning, which concentrates exclusively on cause-and-effect relations without requiring explicit entity attributes. Instead of relying on direct prompt-based inference, we construct the causal knowledge graph as an explicit intermediate structured layer, efficiently bootstrapped by LLMs, which serves as a persistent and updatable domain memory. This design improves reasoning stability and directional consistency while facilitating knowledge maintenance and iterative updates without model retraining. Experimental results on automotive chip value-chain question answering tasks demonstrate that the proposed framework consistently improves reasoning accuracy, causal directionality, and interpretability compared with vanilla LLMs and conventional knowledge-graph-based retrieval methods. In particular, for the causal-knowledge fusion task, the cosine similarity of GLM4-9B improved from 9.63 to 21.75. These findings highlight the effectiveness of structured graph-based reasoning scaffolds as intermediate representations for enhancing LLM-based reasoning in complex industrial domains. Code and data are made available on <span><span>https://github.com/shuangxueliu/C-KHG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131343"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.eswa.2026.131372
Xianhao Zhang, Hongfei Zhan
Injection molding is an efficient method for the mass production of plastic products, but product quality is susceptible to variations in process conditions and parameters. To improve the real-time performance and accuracy of quality control, deep learning-based data-driven prediction methods have become a research focus. Nevertheless, existing injection molding quality prediction methods tend to prematurely couple variable channels and still face limitations in information fusion and model efficiency. Therefore, this paper combines multi-source data and proposes a quality prediction method based on Multi-Stage Feature Decoupling and Fusion (MSFDF). To address the issue of premature coupling of multivariate features in injection molding, a Temporal and Channel Decoupling Based Multi-Scale Feature Extraction Module (TC-DMFE) is designed to extract multi-scale features while maintaining feature independence. In addition, to address the issue of inadequate integration of multi-scale information during the injection molding process, a Channel-wise Multi-scale Feature Fusion Module (CMFF) is proposed, which fully integrates multi-scale features through a channel by channel fusion strategy and enhances the model’s comprehensive understanding of injection molding process variables under multi-scale variation patterns. On this basis, a Deep Feature Guided Channel Attention Recoupling Module (DCAR) is further constructed to learn inter-channel dependencies and apply channel weighting to achieve more effective variable recoupling. The model proposed in this paper effectively reduces training time while maintaining prediction accuracy and possesses the ability to quickly adapt to injection molding production scenarios.
{"title":"A quality prediction method for injection molding products based on multi-stage feature decoupling and fusion","authors":"Xianhao Zhang, Hongfei Zhan","doi":"10.1016/j.eswa.2026.131372","DOIUrl":"10.1016/j.eswa.2026.131372","url":null,"abstract":"<div><div>Injection molding is an efficient method for the mass production of plastic products, but product quality is susceptible to variations in process conditions and parameters. To improve the real-time performance and accuracy of quality control, deep learning-based data-driven prediction methods have become a research focus. Nevertheless, existing injection molding quality prediction methods tend to prematurely couple variable channels and still face limitations in information fusion and model efficiency. Therefore, this paper combines multi-source data and proposes a quality prediction method based on Multi-Stage Feature Decoupling and Fusion (MSFDF). To address the issue of premature coupling of multivariate features in injection molding, a Temporal and Channel Decoupling Based Multi-Scale Feature Extraction Module (TC-DMFE) is designed to extract multi-scale features while maintaining feature independence. In addition, to address the issue of inadequate integration of multi-scale information during the injection molding process, a Channel-wise Multi-scale Feature Fusion Module (CMFF) is proposed, which fully integrates multi-scale features through a channel by channel fusion strategy and enhances the model’s comprehensive understanding of injection molding process variables under multi-scale variation patterns. On this basis, a Deep Feature Guided Channel Attention Recoupling Module (DCAR) is further constructed to learn inter-channel dependencies and apply channel weighting to achieve more effective variable recoupling. The model proposed in this paper effectively reduces training time while maintaining prediction accuracy and possesses the ability to quickly adapt to injection molding production scenarios.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131372"},"PeriodicalIF":7.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}