首页 > 最新文献

Information Fusion最新文献

英文 中文
ChatAssistDesign: A language-interactive framework for iterative vector floorplan generation via conditional diffusion ChatAssistDesign:一种通过条件扩散生成迭代矢量平面图的语言交互框架
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-20 DOI: 10.1016/j.inffus.2025.104091
Luping Li , Xing Su , Han Lin , Haoying Han , Chao Fan , Zhao Zhang , Hongzhe Yue
Architectural design, a complex optimization process requiring iterative revisions by skilled architects, increasingly leverages computational tools. While deep generative models show promise in automating floorplan generation, two key limitations persist: (1) reliance on domain expertise, creating high technical barriers for non-experts, and (2) lack of iterative refinement capabilities, limiting post-generation adjustments. To address these challenges, we propose ChatAssistDesign, an interactive text-driven framework combining (1) Floorplan Designer, a large language model (LLM) agent guiding users through design workflows, and (2) ConDiffPlan, a vector-based conditional diffusion model for layout generation. Extensive experimental results demonstrate that our framework achieves significant improvements over state-of-the-art methods in terms of layout diversity, visual realism, text-to-layout alignment accuracy, and crucially, the ability to support iterative refinement while maintaining high robustness against constraint conflicts. By abstracting design complexity from user skill and enabling dynamic post hoc edits, our approach reduces entry barriers and improves integration with downstream tasks.
架构设计是一个复杂的优化过程,需要熟练的架构师进行迭代修改,它越来越多地利用计算工具。虽然深度生成模型在自动化平面图生成方面显示出前景,但仍然存在两个关键限制:(1)依赖领域专业知识,为非专家创造了很高的技术壁垒;(2)缺乏迭代细化能力,限制了生成后的调整。为了应对这些挑战,我们提出了ChatAssistDesign,这是一个交互式文本驱动框架,它结合了(1)Floorplan Designer,一个引导用户完成设计工作流的大型语言模型(LLM)代理,以及(2)ConDiffPlan,一个用于布局生成的基于向量的条件扩散模型。广泛的实验结果表明,我们的框架在布局多样性、视觉真实感、文本到布局的对齐精度方面比最先进的方法取得了显著的改进,最重要的是,支持迭代改进的能力,同时保持对约束冲突的高鲁棒性。通过从用户技能中抽象出设计复杂性,并启用动态的事后编辑,我们的方法减少了入门障碍,并提高了与下游任务的集成。
{"title":"ChatAssistDesign: A language-interactive framework for iterative vector floorplan generation via conditional diffusion","authors":"Luping Li ,&nbsp;Xing Su ,&nbsp;Han Lin ,&nbsp;Haoying Han ,&nbsp;Chao Fan ,&nbsp;Zhao Zhang ,&nbsp;Hongzhe Yue","doi":"10.1016/j.inffus.2025.104091","DOIUrl":"10.1016/j.inffus.2025.104091","url":null,"abstract":"<div><div>Architectural design, a complex optimization process requiring iterative revisions by skilled architects, increasingly leverages computational tools. While deep generative models show promise in automating floorplan generation, two key limitations persist: (1) reliance on domain expertise, creating high technical barriers for non-experts, and (2) lack of iterative refinement capabilities, limiting post-generation adjustments. To address these challenges, we propose ChatAssistDesign, an interactive text-driven framework combining (1) Floorplan Designer, a large language model (LLM) agent guiding users through design workflows, and (2) ConDiffPlan, a vector-based conditional diffusion model for layout generation. Extensive experimental results demonstrate that our framework achieves significant improvements over state-of-the-art methods in terms of layout diversity, visual realism, text-to-layout alignment accuracy, and crucially, the ability to support iterative refinement while maintaining high robustness against constraint conflicts. By abstracting design complexity from user skill and enabling dynamic post hoc edits, our approach reduces entry barriers and improves integration with downstream tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104091"},"PeriodicalIF":15.5,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypergraph attention and periodic fusion learning for enhanced flight delay prediction 基于超图注意和周期融合学习的航班延误预测
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-20 DOI: 10.1016/j.inffus.2025.104076
Chi Li , Haowen Jiang , Ruitao Zhou , Ye Dou , Zishun Shen , Lianmin Zhang , Xiongwen Qian , Jianfeng Mao
Predicting flight delays is crucial for enhancing operational efficiency, improving passenger satisfaction, and optimizing resource allocation within the aviation industry. Despite numerous methods and technologies available in this field, current approaches largely rely on complex feature engineering and sampling techniques, and they do not thoroughly explore the core influencing factors of flight delays. To address the myriad challenges in predicting flight delays, we propose the Hypergraph Attention and Periodic Fusion Learning (HAPFL) framework. Our model comprises modules for hypergraph construction, O-D driven graph attention, multi-view flight embedding, and a period-aware sequential transformer. This holistic approach enables a thorough analysis of the micro and macro integration of flight node representations and, through periodic feature extraction, predicts the delay status of flights over multiple future days. Tested on several real-world datasets, our model consistently outperforms current state-of-the-art baseline models, achieving competitive results across all four classification metrics, demonstrating superior overall predictive performance and the effective learning capabilities of its well-designed modules. Our model innovatively captures high-order relationships between flights, significantly enhancing future delay predictions, and contributing to a deeper understanding of delay mechanisms and more effective flight schedule management.
预测航班延误对于提高运营效率、提高乘客满意度和优化航空业资源配置至关重要。尽管该领域有许多方法和技术,但目前的方法主要依赖于复杂的特征工程和采样技术,没有深入探索航班延误的核心影响因素。为了解决预测航班延误的无数挑战,我们提出了超图注意和周期融合学习(HAPFL)框架。我们的模型包括超图构建模块、O-D驱动的图注意模块、多视图飞行嵌入模块和周期感知顺序转换器模块。这种整体方法能够对飞行节点表示的微观和宏观集成进行全面分析,并通过周期性特征提取,预测未来几天内航班的延误状态。在几个真实数据集上测试,我们的模型始终优于当前最先进的基线模型,在所有四个分类指标上都取得了具有竞争力的结果,展示了卓越的整体预测性能和精心设计的模块的有效学习能力。我们的模型创新地捕捉了航班之间的高阶关系,显著增强了未来的延误预测,有助于更深入地理解延误机制和更有效的航班计划管理。
{"title":"Hypergraph attention and periodic fusion learning for enhanced flight delay prediction","authors":"Chi Li ,&nbsp;Haowen Jiang ,&nbsp;Ruitao Zhou ,&nbsp;Ye Dou ,&nbsp;Zishun Shen ,&nbsp;Lianmin Zhang ,&nbsp;Xiongwen Qian ,&nbsp;Jianfeng Mao","doi":"10.1016/j.inffus.2025.104076","DOIUrl":"10.1016/j.inffus.2025.104076","url":null,"abstract":"<div><div>Predicting flight delays is crucial for enhancing operational efficiency, improving passenger satisfaction, and optimizing resource allocation within the aviation industry. Despite numerous methods and technologies available in this field, current approaches largely rely on complex feature engineering and sampling techniques, and they do not thoroughly explore the core influencing factors of flight delays. To address the myriad challenges in predicting flight delays, we propose the Hypergraph Attention and Periodic Fusion Learning (HAPFL) framework. Our model comprises modules for hypergraph construction, O-D driven graph attention, multi-view flight embedding, and a period-aware sequential transformer. This holistic approach enables a thorough analysis of the micro and macro integration of flight node representations and, through periodic feature extraction, predicts the delay status of flights over multiple future days. Tested on several real-world datasets, our model consistently outperforms current state-of-the-art baseline models, achieving competitive results across all four classification metrics, demonstrating superior overall predictive performance and the effective learning capabilities of its well-designed modules. Our model innovatively captures high-order relationships between flights, significantly enhancing future delay predictions, and contributing to a deeper understanding of delay mechanisms and more effective flight schedule management.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"129 ","pages":"Article 104076"},"PeriodicalIF":15.5,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ST-Imputer: Multivariate dependency-aware diffusion network with physics guidance for spatiotemporal imputation st -输入:多变量依赖感知扩散网络与物理指导的时空输入
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-20 DOI: 10.1016/j.inffus.2025.104084
Xingyu Zhao , Jianpeng Qi , Bin Lu , Lei Zhou , Lei Cao , Junyu Dong , Yanwei Yu
Data preparation is crucial for achieving optimal results in deep learning. Unfortunately, missing values are common when preparing large-scale spatiotemporal databases. Most existing imputation methods primarily focus on exploring the spatiotemporal correlations of single-source data; however, high missing rates in single-source data result in sparse distributions. Furthermore, existing methods typically focus on shallow correlations at a single scale, limiting the ability of imputation models to effectively leverage multi-scale spatial features. To tackle these challenges, we propose a multivariate dependency-aware spatiotemporal imputation model, named ST-Imputer. Specifically, we introduce multi-source context data to provide sufficient correlation features for target data (i.e., data that needs imputation), alleviating the issue of insufficient available features caused by high missing rates in single-source data. By applying a multi-variate spatiotemporal dependency extraction module, ST-Imputer captures potential associations between different spatial scales. Subsequently, the noise prediction module utilizes the learned dual-view features to formulate the spatiotemporal transmission module, thereby reducing weight errors caused by excessive noise. Finally, physical constraints are applied to prevent unrealistic predictions. Extensive experiments on three large-scale datasets demonstrate the significant superiority of ST-Imputer, achieving up to a 13.07 % improvement in RMSE. The code of our model is available at https://github.com/Lion1a/ST-Imputer.
数据准备对于实现深度学习的最佳结果至关重要。不幸的是,在准备大规模时空数据库时,缺失值是常见的。现有的估算方法大多侧重于探索单源数据的时空相关性;然而,单源数据的高缺失率导致稀疏分布。此外,现有方法通常侧重于单尺度的浅层相关性,限制了输入模型有效利用多尺度空间特征的能力。为了解决这些挑战,我们提出了一个多元依赖感知的时空imputer模型,称为ST-Imputer。具体来说,我们引入多源上下文数据,为目标数据(即需要imputation的数据)提供足够的相关性特征,缓解了单源数据缺失率高导致可用特征不足的问题。通过应用多变量时空依赖提取模块,ST-Imputer捕获不同空间尺度之间的潜在关联。随后,噪声预测模块利用学习到的双视特征来制定时空传输模块,从而减少过多噪声带来的权重误差。最后,应用物理约束来防止不切实际的预测。在三个大规模数据集上的大量实验证明了ST-Imputer的显著优势,RMSE提高了13.07%。我们模型的代码可以在https://github.com/Lion1a/ST-Imputer上找到。
{"title":"ST-Imputer: Multivariate dependency-aware diffusion network with physics guidance for spatiotemporal imputation","authors":"Xingyu Zhao ,&nbsp;Jianpeng Qi ,&nbsp;Bin Lu ,&nbsp;Lei Zhou ,&nbsp;Lei Cao ,&nbsp;Junyu Dong ,&nbsp;Yanwei Yu","doi":"10.1016/j.inffus.2025.104084","DOIUrl":"10.1016/j.inffus.2025.104084","url":null,"abstract":"<div><div>Data preparation is crucial for achieving optimal results in deep learning. Unfortunately, missing values are common when preparing large-scale spatiotemporal databases. Most existing imputation methods primarily focus on exploring the spatiotemporal correlations of single-source data; however, high missing rates in single-source data result in sparse distributions. Furthermore, existing methods typically focus on shallow correlations at a single scale, limiting the ability of imputation models to effectively leverage multi-scale spatial features. To tackle these challenges, we propose a multivariate dependency-aware spatiotemporal imputation model, named ST-Imputer. Specifically, we introduce multi-source context data to provide sufficient correlation features for target data (<em>i.e</em>., data that needs imputation), alleviating the issue of insufficient available features caused by high missing rates in single-source data. By applying a multi-variate spatiotemporal dependency extraction module, ST-Imputer captures potential associations between different spatial scales. Subsequently, the noise prediction module utilizes the learned dual-view features to formulate the spatiotemporal transmission module, thereby reducing weight errors caused by excessive noise. Finally, physical constraints are applied to prevent unrealistic predictions. Extensive experiments on three large-scale datasets demonstrate the significant superiority of ST-Imputer, achieving up to a 13.07 % improvement in RMSE. The code of our model is available at <span><span>https://github.com/Lion1a/ST-Imputer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104084"},"PeriodicalIF":15.5,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shrinkage matters: evidence from accuracy-diversity trade-off in regression ensembles 收缩问题:来自回归集合中准确性-多样性权衡的证据
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-19 DOI: 10.1016/j.inffus.2025.104073
Han Feng , Pengyang Song , Yinuo Ren , Hanfeng Zhou , Jue Wang
Regression ensembles, a competitive machine learning technique, have gained popularity in recent years. Popular ensemble schemes have evolved from equal weights (EWs), which utilize simple averages, to optimal weights (OWs), which optimize weights by minimizing mean squared error (MSE). Extensive research has not only validated the robustness of EWs but also introduced the concept of shrinkage, shrinking OWs towards EWs. This paper tackles the ensemble challenge through diversity theory, where ensemble MSE is decomposed into two components: global error and global diversity. Within the decomposition framework, OWs typically minimize global error at the expense of reduced global diversity, while EWs tend to maximize global diversity but often ignore the accuracy. To address the accuracy-diversity trade-off, we derive an optimal shrinkage factor that manages to minimize the ensemble MSE. Simulation results reveal the mediation effect of shrinkage weights, and empirical experiments on six UCI datasets and Brent monthly future prices demonstrate the superiority of the proposed method, whose mechanism is further expounded through an in-depth analysis of the shrinkage components. Overall, our approach provides a novel perspective on the efficacy of shrinkage in regression ensembles.
回归集成是一种有竞争力的机器学习技术,近年来越来越受欢迎。流行的集成方案已经从利用简单平均值的等权(EWs)发展到通过最小化均方误差(MSE)来优化权重的最优权(OWs)。大量的研究不仅验证了EWs的鲁棒性,而且引入了收缩的概念,将OWs收缩到EWs。本文通过多样性理论解决了集成的挑战,将集成MSE分解为两个部分:全局误差和全局多样性。在分解框架中,OWs通常以降低全局多样性为代价来最小化全局误差,而EWs倾向于最大化全局多样性,但往往忽略了准确性。为了解决准确性和多样性之间的权衡,我们推导了一个最佳收缩因子,以最小化集合MSE。模拟结果揭示了收缩率权重的中介作用,在6个UCI数据集和布伦特月度期货价格上的实证实验证明了该方法的优越性,并通过对收缩率分量的深入分析,进一步阐述了该方法的作用机理。总的来说,我们的方法提供了一个新颖的视角对收缩的有效性在回归集合。
{"title":"Shrinkage matters: evidence from accuracy-diversity trade-off in regression ensembles","authors":"Han Feng ,&nbsp;Pengyang Song ,&nbsp;Yinuo Ren ,&nbsp;Hanfeng Zhou ,&nbsp;Jue Wang","doi":"10.1016/j.inffus.2025.104073","DOIUrl":"10.1016/j.inffus.2025.104073","url":null,"abstract":"<div><div>Regression ensembles, a competitive machine learning technique, have gained popularity in recent years. Popular ensemble schemes have evolved from equal weights (EWs), which utilize simple averages, to optimal weights (OWs), which optimize weights by minimizing mean squared error (MSE). Extensive research has not only validated the robustness of EWs but also introduced the concept of shrinkage, shrinking OWs towards EWs. This paper tackles the ensemble challenge through diversity theory, where ensemble MSE is decomposed into two components: global error and global diversity. Within the decomposition framework, OWs typically minimize global error at the expense of reduced global diversity, while EWs tend to maximize global diversity but often ignore the accuracy. To address the accuracy-diversity trade-off, we derive an optimal shrinkage factor that manages to minimize the ensemble MSE. Simulation results reveal the mediation effect of shrinkage weights, and empirical experiments on six UCI datasets and Brent monthly future prices demonstrate the superiority of the proposed method, whose mechanism is further expounded through an in-depth analysis of the shrinkage components. Overall, our approach provides a novel perspective on the efficacy of shrinkage in regression ensembles.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104073"},"PeriodicalIF":15.5,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey of uncertainty estimation in LLMs - Sources, methods, applications, and challenges 法学硕士不确定性评估综述——来源、方法、应用和挑战
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-19 DOI: 10.1016/j.inffus.2025.104057
Jianfeng He , Linlin Yu , Changbin Li , Runing Yang , Fanglan Chen , Kangshuo Li , Min Zhang , Shuo Lei , Xuchao Zhang , Mohammad Beigi , Kaize Ding , Bei Xiao , Lifu Huang , Feng Chen , Ming Jin , Chang-Tien Lu
Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of domains. However, inaccuracies in their outputs can lead to severe consequences in high-stakes areas such as finance and healthcare, where errors may result in the loss of money, time, or even lives. As a result, recent research has increasingly focused on uncertainty estimation in LLMs, aiming to quantify the trustworthiness of model-generated content given specific inputs. Despite this growing interest, the sources of uncertainty in LLMs remain insufficiently understood. As a result, this survey provides a comprehensive overview of uncertainty estimation for LLMs from the perspective of uncertainty sources, serving as a foundational resource for researchers entering the field. We begin by reviewing essential background on LLMs, followed by a detailed clarification of uncertainty sources relevant to them. We then introduce various uncertainty estimation methods, including both commonly used and LLM-specific approaches. Metrics for evaluating uncertainty are discussed, along with key application areas. Finally, we highlight major challenges and outline future research directions aimed at improving the trustworthiness and reliability of LLMs.
大型语言模型(llm)已经在广泛的领域中展示了卓越的性能。然而,在金融和医疗保健等高风险领域,其输出的不准确可能导致严重后果,在这些领域,错误可能导致金钱、时间甚至生命的损失。因此,最近的研究越来越关注法学硕士中的不确定性估计,旨在量化给定特定输入的模型生成内容的可信度。尽管人们对法学硕士越来越感兴趣,但法学硕士不确定性的来源仍然没有得到充分的了解。因此,本调查从不确定性来源的角度对法学硕士的不确定性估计进行了全面的概述,为进入该领域的研究人员提供了基础资源。我们首先回顾法学硕士的基本背景,然后详细澄清与他们相关的不确定性来源。然后,我们介绍了各种不确定性估计方法,包括常用的和法学硕士特有的方法。讨论了评估不确定性的度量,以及关键的应用领域。最后,我们强调了主要挑战,并概述了旨在提高法学硕士可信度和可靠性的未来研究方向。
{"title":"Survey of uncertainty estimation in LLMs - Sources, methods, applications, and challenges","authors":"Jianfeng He ,&nbsp;Linlin Yu ,&nbsp;Changbin Li ,&nbsp;Runing Yang ,&nbsp;Fanglan Chen ,&nbsp;Kangshuo Li ,&nbsp;Min Zhang ,&nbsp;Shuo Lei ,&nbsp;Xuchao Zhang ,&nbsp;Mohammad Beigi ,&nbsp;Kaize Ding ,&nbsp;Bei Xiao ,&nbsp;Lifu Huang ,&nbsp;Feng Chen ,&nbsp;Ming Jin ,&nbsp;Chang-Tien Lu","doi":"10.1016/j.inffus.2025.104057","DOIUrl":"10.1016/j.inffus.2025.104057","url":null,"abstract":"<div><div>Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of domains. However, inaccuracies in their outputs can lead to severe consequences in high-stakes areas such as finance and healthcare, where errors may result in the loss of money, time, or even lives. As a result, recent research has increasingly focused on uncertainty estimation in LLMs, aiming to quantify the trustworthiness of model-generated content given specific inputs. Despite this growing interest, the sources of uncertainty in LLMs remain insufficiently understood. As a result, this survey provides a comprehensive overview of uncertainty estimation for LLMs from the perspective of uncertainty sources, serving as a foundational resource for researchers entering the field. We begin by reviewing essential background on LLMs, followed by a detailed clarification of uncertainty sources relevant to them. We then introduce various uncertainty estimation methods, including both commonly used and LLM-specific approaches. Metrics for evaluating uncertainty are discussed, along with key application areas. Finally, we highlight major challenges and outline future research directions aimed at improving the trustworthiness and reliability of LLMs.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104057"},"PeriodicalIF":15.5,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified framework for multimodal emotion recognition across homogeneous and heterogeneous modalities with adaptive fusion 基于自适应融合的同质和异质多模态情感识别的统一框架
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-19 DOI: 10.1016/j.inffus.2025.104072
Abeer A. Wafa , Marwa S. Farhan , Mai M. Eldefrawi
Amid the growing demand for emotionally intelligent systems, Multimodal Emotion Recognition (MER) has emerged as a critical frontier in affective computing. However, achieving reliable generalization across heterogeneous data sources and ensuring semantic alignment across diverse modalities remain unresolved challenges. So, this research presents a novel and unified framework for MER that unfolds in five coordinated stages: modality-specific cross-dataset pretraining, diffusion-based generative data augmentation, reinforcement learning-driven hyperparameter optimization, latent space alignment, and task-aware multimodal fusion with fine-tuning. Each modality-text, audio, video, and motion-is initially pretrained using large-scale, emotion-labeled corpora to extract domain-invariant affective features. A generative augmentation stage that uses diffusion models increases sample diversity and improves class balance. Hyperparameter scheduling is governed by a Proximal Policy Optimization (PPO) agent that dynamically adjusts learning parameters during both pretraining and fine-tuning phases. Latent space alignment is achieved through a combination of domain-adversarial objectives, statistical regularization (e.g., MMD, CCA), and prototypical contrastive learning. The fusion strategy integrates Cross-Attentional Modality Interaction (CAMI), Bidirectional Alignment Networks (BAN), Gaussian Mixture Interaction Modules (GMIM), and Neural Variational Mixture-of-Experts (NV-MoE) to support context-aware and uncertainty-resilient emotion inference.
Empirical evaluations on MELD, IEMOCAP, and SAVEE demonstrate exceptional performance. Test accuracies reached 99.91 %, 99.87 %, and 99.52 % respectively, with minimal losses ( ≤  0.000056) and inference latencies between 0.02-0.07 ms. Post-alignment diagnostics across 100 runs revealed highly stable latent embeddings (Silhouette: 0.960-0.980, CKA: 0.970-0.990), confirming strong cross-modal coherence. Zero-shot testing on external unseen datasets (GoEmotions, CREMA-D, EmotiW, HUMAINE) yielded accuracies above 99.90 %, demonstrating robust generalization without fine-tuning. Even though the model is trained on batch data, the deployment through ONNX ensures adaptability for real-time emotion recognition in resource-constrained environments. These findings establish the proposed system as a highly performant and deployable solution for multimodal affect analysis.
随着人们对情感智能系统的需求不断增长,多模态情感识别(MER)已经成为情感计算的一个关键前沿。然而,实现跨异构数据源的可靠泛化和确保跨不同模式的语义一致性仍然是未解决的挑战。因此,本研究提出了一个新的统一的MER框架,该框架分为五个协调的阶段:特定模态的跨数据集预训练、基于扩散的生成数据增强、强化学习驱动的超参数优化、潜在空间对齐以及任务感知的多模态融合与微调。每种情态——文本、音频、视频和动作——首先使用大规模的、情感标记的语料库进行预训练,以提取领域不变的情感特征。使用扩散模型的生成增强阶段增加了样本多样性并改善了类别平衡。超参数调度由近端策略优化(PPO)代理控制,该代理在预训练和微调阶段动态调整学习参数。潜在空间对齐是通过领域对抗目标、统计正则化(例如,MMD、CCA)和原型对比学习的组合来实现的。该融合策略集成了交叉注意模态交互(CAMI)、双向对齐网络(BAN)、高斯混合交互模块(GMIM)和神经变分混合专家(NV-MoE),以支持上下文感知和不确定性弹性情绪推理。对MELD、IEMOCAP和SAVEE的实证评估显示出卓越的性能。测试精度分别达到99.91%,99.87%和99.52%,损失最小( ≤ 0.000056),推理延迟在0.02-0.07 ms之间。100次运行的比对后诊断显示高度稳定的潜在嵌入(剪影:0.960-0.980,CKA: 0.970-0.990),证实了强的跨模态相干性。在外部未见过的数据集(GoEmotions, CREMA-D, EmotiW, HUMAINE)上进行零采样测试,准确率超过99.90%,证明了无需微调的稳健泛化。尽管该模型是在批量数据上训练的,但通过ONNX的部署确保了在资源受限环境下对实时情绪识别的适应性。这些发现使所提出的系统成为多模态影响分析的高性能和可部署的解决方案。
{"title":"A unified framework for multimodal emotion recognition across homogeneous and heterogeneous modalities with adaptive fusion","authors":"Abeer A. Wafa ,&nbsp;Marwa S. Farhan ,&nbsp;Mai M. Eldefrawi","doi":"10.1016/j.inffus.2025.104072","DOIUrl":"10.1016/j.inffus.2025.104072","url":null,"abstract":"<div><div>Amid the growing demand for emotionally intelligent systems, Multimodal Emotion Recognition (MER) has emerged as a critical frontier in affective computing. However, achieving reliable generalization across heterogeneous data sources and ensuring semantic alignment across diverse modalities remain unresolved challenges. So, this research presents a novel and unified framework for MER that unfolds in five coordinated stages: modality-specific cross-dataset pretraining, diffusion-based generative data augmentation, reinforcement learning-driven hyperparameter optimization, latent space alignment, and task-aware multimodal fusion with fine-tuning. Each modality-text, audio, video, and motion-is initially pretrained using large-scale, emotion-labeled corpora to extract domain-invariant affective features. A generative augmentation stage that uses diffusion models increases sample diversity and improves class balance. Hyperparameter scheduling is governed by a Proximal Policy Optimization (PPO) agent that dynamically adjusts learning parameters during both pretraining and fine-tuning phases. Latent space alignment is achieved through a combination of domain-adversarial objectives, statistical regularization (e.g., MMD, CCA), and prototypical contrastive learning. The fusion strategy integrates Cross-Attentional Modality Interaction (CAMI), Bidirectional Alignment Networks (BAN), Gaussian Mixture Interaction Modules (GMIM), and Neural Variational Mixture-of-Experts (NV-MoE) to support context-aware and uncertainty-resilient emotion inference.</div><div>Empirical evaluations on MELD, IEMOCAP, and SAVEE demonstrate exceptional performance. Test accuracies reached 99.91 %, 99.87 %, and 99.52 % respectively, with minimal losses ( ≤  0.000056) and inference latencies between 0.02-0.07 ms. Post-alignment diagnostics across 100 runs revealed highly stable latent embeddings (Silhouette: 0.960-0.980, CKA: 0.970-0.990), confirming strong cross-modal coherence. Zero-shot testing on external unseen datasets (GoEmotions, CREMA-D, EmotiW, HUMAINE) yielded accuracies above 99.90 %, demonstrating robust generalization without fine-tuning. Even though the model is trained on batch data, the deployment through ONNX ensures adaptability for real-time emotion recognition in resource-constrained environments. These findings establish the proposed system as a highly performant and deployable solution for multimodal affect analysis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"129 ","pages":"Article 104072"},"PeriodicalIF":15.5,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STP-Diff: Synergistic fusion of spatial transformation perturbations and diffusion models for robust face privacy protection STP-Diff:鲁棒人脸隐私保护的空间变换摄动和扩散模型的协同融合
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1016/j.inffus.2025.104069
Mingyue Li , Yinghao Zhang , Ruizhong Du , Chunfu Jia , Xiaoyun Guang
The proliferation of digital portraits and the widespread adoption of advanced Face Recognition (FR) systems pose significant privacy threats, rendering the protection of facial identities paramount. However, existing methods face a universal challenge in balancing protection efficacy with visual fidelity: diffusion-based approaches often suffer from diminished protection due to their inherent purification effects, while standalone Spatial Transformation Perturbations (STPs) risk distorting critical facial features and often yield insufficient protection efficacy. To address these limitations, this paper introduces STP-Diff, a synergistic fusion method that integrates spatial and additive perturbations via a region-differentiated perturbation strategy. Specifically, our method applies non-additive spatial perturbations to non-salient regions as a pre-perturbation to resist the diffusion purification effect, thereby providing a more advantageous starting point for the subsequent diffusion model optimization. Building on this foundation, the method concentrates the potent generative capabilities of diffusion models onto identity-critical regions to generate effective additive perturbations for targeted protection. By strategically deploying spatial transformations, a largely under-explored technique in the facial privacy protection domain, our synergistic fusion strategy significantly enhances protection efficacy while achieving excellent visual quality. Extensive experiments on public datasets demonstrate that our method exhibits superior facial privacy protection in black-box targeted scenarios, achieving an average Protection Success Rate (PSR) of 81.09 % and a favorable Fréchet Inception Distance (FID) of 8.79, and demonstrates robust transferability against commercial Face Recognition platforms.
数字肖像的激增和先进面部识别(FR)系统的广泛采用构成了重大的隐私威胁,使得面部身份的保护至关重要。然而,现有的方法在平衡保护效果和视觉保真度方面面临着普遍的挑战:基于扩散的方法由于其固有的净化效果往往会降低保护效果,而独立的空间变换摄动(STPs)可能会扭曲关键的面部特征,往往产生不足的保护效果。为了解决这些限制,本文介绍了STP-Diff,一种通过区域微分摄动策略集成空间和可加摄动的协同融合方法。具体而言,我们的方法将非显著区域的非加性空间扰动作为预扰动来抵抗扩散净化效应,从而为后续扩散模型优化提供更有利的起点。在此基础上,该方法将扩散模型的强大生成能力集中在身份关键区域上,以产生有效的加性摄动来进行目标保护。通过战略性地部署空间转换(一种在面部隐私保护领域尚未充分开发的技术),我们的协同融合策略显著提高了保护效果,同时实现了出色的视觉质量。在公共数据集上的大量实验表明,我们的方法在黑盒目标场景中表现出优越的面部隐私保护,平均保护成功率(PSR)为81.09%,有利的fr起始距离(FID)为8.79,并且在商业面部识别平台上具有强大的可移植性。
{"title":"STP-Diff: Synergistic fusion of spatial transformation perturbations and diffusion models for robust face privacy protection","authors":"Mingyue Li ,&nbsp;Yinghao Zhang ,&nbsp;Ruizhong Du ,&nbsp;Chunfu Jia ,&nbsp;Xiaoyun Guang","doi":"10.1016/j.inffus.2025.104069","DOIUrl":"10.1016/j.inffus.2025.104069","url":null,"abstract":"<div><div>The proliferation of digital portraits and the widespread adoption of advanced Face Recognition (FR) systems pose significant privacy threats, rendering the protection of facial identities paramount. However, existing methods face a universal challenge in balancing protection efficacy with visual fidelity: diffusion-based approaches often suffer from diminished protection due to their inherent purification effects, while standalone Spatial Transformation Perturbations (STPs) risk distorting critical facial features and often yield insufficient protection efficacy. To address these limitations, this paper introduces STP-Diff, a synergistic fusion method that integrates spatial and additive perturbations via a region-differentiated perturbation strategy. Specifically, our method applies non-additive spatial perturbations to non-salient regions as a pre-perturbation to resist the diffusion purification effect, thereby providing a more advantageous starting point for the subsequent diffusion model optimization. Building on this foundation, the method concentrates the potent generative capabilities of diffusion models onto identity-critical regions to generate effective additive perturbations for targeted protection. By strategically deploying spatial transformations, a largely under-explored technique in the facial privacy protection domain, our synergistic fusion strategy significantly enhances protection efficacy while achieving excellent visual quality. Extensive experiments on public datasets demonstrate that our method exhibits superior facial privacy protection in black-box targeted scenarios, achieving an average Protection Success Rate (PSR) of 81.09 % and a favorable Fréchet Inception Distance (FID) of 8.79, and demonstrates robust transferability against commercial Face Recognition platforms.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"129 ","pages":"Article 104069"},"PeriodicalIF":15.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating visual and audio cues for emotion and gender recognition: A multi modal and multi task approach 整合视听线索的情绪和性别识别:一个多模式和多任务的方法
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1016/j.inffus.2025.104071
Giuseppe De Simone , Luca Greco , Alessia Saggese , Mario Vento
Gender and emotion recognition are traditionally analyzed independently using audio and video modalities, which introduces challenges when fusing their outputs and often results in increased computational overhead and latency. To address these limitations, in this work we introduces MAGNET (Multimodal Architecture for GeNder and Emotion Tasks), a novel multimodal multitask learning framework that jointly performs gender and emotion recognition by simultaneously analyzing audio and visual inputs. MAGNET employs soft parameter sharing, guided by GradNorm to balance task-specific learning dynamics. This design not only enhances recognition accuracy through effective modality fusion but also reduces model complexity by leveraging multitask learning. As a result, our approach is particularly well-suited for deployment on embedded devices, where computational efficiency and responsiveness are critical. Evaluated on the CREMA-D dataset, MAGNET consistently outperforms unimodal baselines and current state-of-the-art methods, demonstrating its effectiveness for efficient and accurate soft biometric analysis.
传统上,性别和情感识别是使用音频和视频模式独立分析的,这在融合它们的输出时带来了挑战,并且经常导致计算开销和延迟增加。为了解决这些限制,在这项工作中,我们引入了MAGNET(性别和情感任务的多模态架构),这是一个新的多模态多任务学习框架,通过同时分析音频和视觉输入来联合执行性别和情感识别。MAGNET采用软参数共享,在GradNorm的指导下平衡特定任务的学习动态。该设计不仅通过有效的模态融合提高了识别精度,而且利用多任务学习降低了模型复杂性。因此,我们的方法特别适合部署在嵌入式设备上,其中计算效率和响应能力是至关重要的。在CREMA-D数据集上进行评估,MAGNET始终优于单峰基线和当前最先进的方法,证明了其高效、准确的软生物识别分析的有效性。
{"title":"Integrating visual and audio cues for emotion and gender recognition: A multi modal and multi task approach","authors":"Giuseppe De Simone ,&nbsp;Luca Greco ,&nbsp;Alessia Saggese ,&nbsp;Mario Vento","doi":"10.1016/j.inffus.2025.104071","DOIUrl":"10.1016/j.inffus.2025.104071","url":null,"abstract":"<div><div>Gender and emotion recognition are traditionally analyzed independently using audio and video modalities, which introduces challenges when fusing their outputs and often results in increased computational overhead and latency. To address these limitations, in this work we introduces MAGNET (Multimodal Architecture for GeNder and Emotion Tasks), a novel multimodal multitask learning framework that jointly performs gender and emotion recognition by simultaneously analyzing audio and visual inputs. MAGNET employs soft parameter sharing, guided by GradNorm to balance task-specific learning dynamics. This design not only enhances recognition accuracy through effective modality fusion but also reduces model complexity by leveraging multitask learning. As a result, our approach is particularly well-suited for deployment on embedded devices, where computational efficiency and responsiveness are critical. Evaluated on the CREMA-D dataset, MAGNET consistently outperforms unimodal baselines and current state-of-the-art methods, demonstrating its effectiveness for efficient and accurate soft biometric analysis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104071"},"PeriodicalIF":15.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Span-aware temporal aggregation network for video moment retrieval 视频时刻检索的跨感知时间聚合网络
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1016/j.inffus.2025.104075
Xingyu Shen , Jinshi Xiao , Xiang Zhang , Long Lan , Xinwang Liu
Video Moment Retrieval (VMR) aims to identify the temporal span in an untrimmed video that semantically corresponds to a natural language query. Existing methods often overlook temporal invariance, making them sensitive to variations in query span and limiting their performance, especially for retrieving short-span moments. To address this limitation, we propose a Span-aware Temporal Aggregation (STA) network that introduces span-aware features to capture temporal invariant patterns, thereby enhancing robustness to varying query spans. STA consists of two key components: (i) A span-aware feature aggregation (SFA) module constructs span-specific visual representations that are aligned with the query to generate span-aware features, which are then integrated into local candidate moments; (ii) a Query-guided Moment Reasoning (QMR) module, which dynamically adapts the receptive fields of temporal convolutions based on query span semantics to achieve fine-grained reasoning. Extensive experiments on three challenging benchmark datasets demonstrate that STA consistently outperforms state-of-the-art methods, with particularly notable gains for short-span moments.
视频时刻检索(Video Moment Retrieval, VMR)的目的是识别在语义上与自然语言查询相对应的未修剪视频的时间跨度。现有的方法经常忽略时间不变性,使它们对查询范围的变化很敏感,限制了它们的性能,特别是在检索短跨度矩时。为了解决这一限制,我们提出了一个跨度感知的时间聚合(STA)网络,该网络引入了跨度感知的特征来捕获时间不变模式,从而增强了对不同查询跨度的鲁棒性。STA由两个关键组件组成:(i)跨度感知特征聚合(SFA)模块构建与查询对齐的特定于跨度的视觉表示,以生成跨度感知特征,然后将其集成到局部候选矩中;(ii)查询引导矩推理(query -guided Moment Reasoning, QMR)模块,基于查询跨度语义动态调整时间卷积的接受域,实现细粒度推理。在三个具有挑战性的基准数据集上进行的大量实验表明,STA始终优于最先进的方法,在短跨度矩方面的收益尤其显著。
{"title":"Span-aware temporal aggregation network for video moment retrieval","authors":"Xingyu Shen ,&nbsp;Jinshi Xiao ,&nbsp;Xiang Zhang ,&nbsp;Long Lan ,&nbsp;Xinwang Liu","doi":"10.1016/j.inffus.2025.104075","DOIUrl":"10.1016/j.inffus.2025.104075","url":null,"abstract":"<div><div>Video Moment Retrieval (VMR) aims to identify the temporal span in an untrimmed video that semantically corresponds to a natural language query. Existing methods often overlook temporal invariance, making them sensitive to variations in query span and limiting their performance, especially for retrieving short-span moments. To address this limitation, we propose a Span-aware Temporal Aggregation (STA) network that introduces span-aware features to capture temporal invariant patterns, thereby enhancing robustness to varying query spans. STA consists of two key components: (i) A span-aware feature aggregation (SFA) module constructs span-specific visual representations that are aligned with the query to generate span-aware features, which are then integrated into local candidate moments; (ii) a Query-guided Moment Reasoning (QMR) module, which dynamically adapts the receptive fields of temporal convolutions based on query span semantics to achieve fine-grained reasoning. Extensive experiments on three challenging benchmark datasets demonstrate that STA consistently outperforms state-of-the-art methods, with particularly notable gains for short-span moments.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104075"},"PeriodicalIF":15.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multitask reinforcement learning with metadata-guided adaptive routing 基于元数据引导的自适应路由的多任务强化学习
IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1016/j.inffus.2025.104068
Rui Pan , Haoran Luo , Quan Yuan , Guiyang Luo , Jinglin Li , Tiesunlong Shen , Rui Mao , Erik Cambria
Multitask reinforcement learning aims to train a unified policy that generalizes across multiple related tasks, improving sample efficiency and promoting knowledge transfer. However, existing methods often suffer from negative knowledge transfer due to task interference, especially when using hard parameter sharing across tasks with diverse dynamics or goals. Conventional solutions typically adopt shared backbones with task-specific heads, gradient projection methods, or routing-based networks to mitigate conflict. However, many of these methods rely on simplistic task identifiers (e.g., one-hot vectors), lack expressive representations of task semantics, or fail to modulate shared components in a fine-grained, task-specific manner. To overcome these challenges, we propose Metadata-guided Adaptive Routing (MetaAR), a novel framework that incorporates rich task metadata such as natural language descriptions to generate expressive and interpretable task representations. These representations are injected into a dynamic routing network, which adaptively reconfigures layer-wise computation paths in a shared modular policy network. To enable robust task-specific adaptation, we further introduce a noise-injected Top-K routing mechanism that dynamically selects the most relevant computation paths for each task. By injecting stochasticity during routing, this mechanism promotes exploration and mitigates interference between tasks through sparse, selective information flow. We evaluate MetaAR on the Meta-World benchmark with up to 50 robotic manipulation tasks, where it consistently outperforms strong baselines, achieving 4–8 % higher mean success rates than the best-performing methods across the MT10 and MT50 variants.
多任务强化学习旨在训练一个统一的策略,该策略可以泛化多个相关任务,提高样本效率并促进知识转移。然而,现有方法往往由于任务干扰而导致知识负迁移,特别是在具有不同动态或目标的任务之间使用硬参数共享时。传统的解决方案通常采用具有特定任务头部的共享骨干网、梯度投影方法或基于路由的网络来缓解冲突。然而,这些方法中的许多依赖于简单的任务标识符(例如,one-hot vectors),缺乏任务语义的表达性表示,或者无法以细粒度的、特定于任务的方式调制共享组件。为了克服这些挑战,我们提出了元数据引导的自适应路由(MetaAR),这是一个新的框架,它结合了丰富的任务元数据,如自然语言描述,以生成富有表现力和可解释的任务表示。这些表示被注入到动态路由网络中,该网络自适应地重新配置共享模块化策略网络中的分层计算路径。为了实现特定任务的鲁棒自适应,我们进一步引入了噪声注入的Top-K路由机制,该机制为每个任务动态选择最相关的计算路径。通过在路由过程中注入随机性,该机制促进了探索,并通过稀疏的选择性信息流减轻了任务之间的干扰。我们用多达50个机器人操作任务在Meta-World基准上评估MetaAR,在这些任务中,它始终优于强大的基线,比MT10和MT50变体中表现最好的方法平均成功率高出4 - 8%。
{"title":"Multitask reinforcement learning with metadata-guided adaptive routing","authors":"Rui Pan ,&nbsp;Haoran Luo ,&nbsp;Quan Yuan ,&nbsp;Guiyang Luo ,&nbsp;Jinglin Li ,&nbsp;Tiesunlong Shen ,&nbsp;Rui Mao ,&nbsp;Erik Cambria","doi":"10.1016/j.inffus.2025.104068","DOIUrl":"10.1016/j.inffus.2025.104068","url":null,"abstract":"<div><div>Multitask reinforcement learning aims to train a unified policy that generalizes across multiple related tasks, improving sample efficiency and promoting knowledge transfer. However, existing methods often suffer from negative knowledge transfer due to task interference, especially when using hard parameter sharing across tasks with diverse dynamics or goals. Conventional solutions typically adopt shared backbones with task-specific heads, gradient projection methods, or routing-based networks to mitigate conflict. However, many of these methods rely on simplistic task identifiers (e.g., one-hot vectors), lack expressive representations of task semantics, or fail to modulate shared components in a fine-grained, task-specific manner. To overcome these challenges, we propose <strong>Meta</strong>data-guided <strong>A</strong>daptive <strong>R</strong>outing (<strong>MetaAR</strong>), a novel framework that incorporates rich task metadata such as natural language descriptions to generate expressive and interpretable task representations. These representations are injected into a dynamic routing network, which adaptively reconfigures layer-wise computation paths in a shared modular policy network. To enable robust task-specific adaptation, we further introduce a noise-injected Top-K routing mechanism that dynamically selects the most relevant computation paths for each task. By injecting stochasticity during routing, this mechanism promotes exploration and mitigates interference between tasks through sparse, selective information flow. We evaluate MetaAR on the Meta-World benchmark with up to 50 robotic manipulation tasks, where it consistently outperforms strong baselines, achieving 4–8 % higher mean success rates than the best-performing methods across the MT10 and MT50 variants.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"129 ","pages":"Article 104068"},"PeriodicalIF":15.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1