Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104154
Xin Qi , Tao Xu , Chengrun Dang , Zhuang Qi , Lei Meng , Han Yu
Artificial intelligence (AI), including machine learning and deep learning models, is increasingly transforming oncology by providing powerful tools to analyze complex multidimensional data. However, developing reliable and generalizable models requires large-scale training datasets, which are often constrained by privacy regulations and the decentralized nature of medical data across institutions. Federated learning has recently emerged as a promising approach that enables collaborative model training across multiple sites without sharing raw data. This survey presents the fundamental principles and architectural frameworks of federated learning, highlighting its strengths in protecting data privacy, improving model robustness, and facilitating the integration of multi-omics and multi-modal datasets. Key applications in cancer detection, prognosis prediction, and treatment response prediction are discussed, underscoring its potential to support clinical decision-making. Moreover, the survey highlights major challenges in applying federated learning to oncology and outlines key directions to advance precision medicine, including the integration of multi-modal data, foundation models, causal reasoning, and continual learning. With ongoing technological advancements, federated learning holds great promise to bridge AI innovation and privacy protection in oncology.
{"title":"Federated learning in oncology: Bridging artificial intelligence innovation and privacy protection","authors":"Xin Qi , Tao Xu , Chengrun Dang , Zhuang Qi , Lei Meng , Han Yu","doi":"10.1016/j.inffus.2026.104154","DOIUrl":"10.1016/j.inffus.2026.104154","url":null,"abstract":"<div><div>Artificial intelligence (AI), including machine learning and deep learning models, is increasingly transforming oncology by providing powerful tools to analyze complex multidimensional data. However, developing reliable and generalizable models requires large-scale training datasets, which are often constrained by privacy regulations and the decentralized nature of medical data across institutions. Federated learning has recently emerged as a promising approach that enables collaborative model training across multiple sites without sharing raw data. This survey presents the fundamental principles and architectural frameworks of federated learning, highlighting its strengths in protecting data privacy, improving model robustness, and facilitating the integration of multi-omics and multi-modal datasets. Key applications in cancer detection, prognosis prediction, and treatment response prediction are discussed, underscoring its potential to support clinical decision-making. Moreover, the survey highlights major challenges in applying federated learning to oncology and outlines key directions to advance precision medicine, including the integration of multi-modal data, foundation models, causal reasoning, and continual learning. With ongoing technological advancements, federated learning holds great promise to bridge AI innovation and privacy protection in oncology.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104154"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104155
Daniel M. Jimenez-Gutierrez , Yelizaveta Falkouskaya , José L. Hernandez-Ramos , Aris Anagnostopoulos , Ioannis Chatzigiannakis , Andrea Vitaletti
Federated Learning (FL) is an emerging distributed machine learning paradigm enabling multiple clients to train a global model collaboratively without sharing their raw data. While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats. This survey provides a comprehensive overview of 203 papers regarding the state-of-the-art attacks and defense mechanisms developed to address these challenges, categorizing them into security-enhancing and privacy-preserving techniques. Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks. At the same time, privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation. We critically analyze the strengths and limitations of existing methods, highlight the trade-offs between privacy, security, and model performance, and discuss the implications of non-IID data distributions on the effectiveness of these defenses. Furthermore, we identify open research challenges and future directions, including the need for scalable, adaptive, and energy-efficient solutions operating in dynamic and heterogeneous FL environments. Our survey aims to guide researchers and practitioners in developing robust and privacy-preserving FL systems, fostering advancements safeguarding collaborative learning frameworks’ integrity and confidentiality.
{"title":"On the security and privacy of federated learning: A survey with attacks, defenses, frameworks, applications, and future directions","authors":"Daniel M. Jimenez-Gutierrez , Yelizaveta Falkouskaya , José L. Hernandez-Ramos , Aris Anagnostopoulos , Ioannis Chatzigiannakis , Andrea Vitaletti","doi":"10.1016/j.inffus.2026.104155","DOIUrl":"10.1016/j.inffus.2026.104155","url":null,"abstract":"<div><div>Federated Learning (FL) is an emerging distributed machine learning paradigm enabling multiple clients to train a global model collaboratively without sharing their raw data. While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats. This survey provides a comprehensive overview of 203 papers regarding the state-of-the-art attacks and defense mechanisms developed to address these challenges, categorizing them into security-enhancing and privacy-preserving techniques. Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks. At the same time, privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation. We critically analyze the strengths and limitations of existing methods, highlight the trade-offs between privacy, security, and model performance, and discuss the implications of non-IID data distributions on the effectiveness of these defenses. Furthermore, we identify open research challenges and future directions, including the need for scalable, adaptive, and energy-efficient solutions operating in dynamic and heterogeneous FL environments. Our survey aims to guide researchers and practitioners in developing robust and privacy-preserving FL systems, fostering advancements safeguarding collaborative learning frameworks’ integrity and confidentiality.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104155"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104153
Huaxiang Liu , Wei Sun , Youyao Fu , Shiqing Zhang , Jie Jin , Jiangxiong Fang , Binliang Wang
Accurate liver segmentation in computed tomography (CT) scans is crucial for the diagnosis of hepatocellular carcinoma and surgical planning; however, manual delineation is laborious and prone to operator variability. Existing deep learning methods frequently sacrifice precise boundary delineation when expanding receptive fields or fail to leverage frequency-domain cues that encode global shape, while conventional attention mechanisms are less effective in processing low-contrast images. To address these challenges, we introduce LWT-Net, a novel network guided by a trainable lifting wavelet transform, incorporating a frequency-split histogram attention mechanism to enhance liver segmentation. LWT-Net incorporates a trainable lifting wavelet transform within an encoder-decoder framework to hierarchically decompose features into low-frequency components that capture global structure and high-frequency bands that preserve edge and texture details. A complementary inverse lifting stage reconstructs high-resolution features while maintaining spatial consistency. The frequency-spatial fusion module, driven by a histogram-based attention mechanism, performs histogram-guided feature reorganization across global and local bins, while employing self-attention to capture long-range dependencies and prioritize anatomically significant regions. Comprehensive evaluations on the LiTS2017, WORD, and FLARE22 datasets confirm LWT-Net’s superior performance, achieving mean Dice similarity coefficients of 95.96%, 97.15%, and 95.97%.
{"title":"Lifting wavelet transform-guided network with histogram attention for liver segmentation in CT scans","authors":"Huaxiang Liu , Wei Sun , Youyao Fu , Shiqing Zhang , Jie Jin , Jiangxiong Fang , Binliang Wang","doi":"10.1016/j.inffus.2026.104153","DOIUrl":"10.1016/j.inffus.2026.104153","url":null,"abstract":"<div><div>Accurate liver segmentation in computed tomography (CT) scans is crucial for the diagnosis of hepatocellular carcinoma and surgical planning; however, manual delineation is laborious and prone to operator variability. Existing deep learning methods frequently sacrifice precise boundary delineation when expanding receptive fields or fail to leverage frequency-domain cues that encode global shape, while conventional attention mechanisms are less effective in processing low-contrast images. To address these challenges, we introduce LWT-Net, a novel network guided by a trainable lifting wavelet transform, incorporating a frequency-split histogram attention mechanism to enhance liver segmentation. LWT-Net incorporates a trainable lifting wavelet transform within an encoder-decoder framework to hierarchically decompose features into low-frequency components that capture global structure and high-frequency bands that preserve edge and texture details. A complementary inverse lifting stage reconstructs high-resolution features while maintaining spatial consistency. The frequency-spatial fusion module, driven by a histogram-based attention mechanism, performs histogram-guided feature reorganization across global and local bins, while employing self-attention to capture long-range dependencies and prioritize anatomically significant regions. Comprehensive evaluations on the LiTS2017, WORD, and FLARE22 datasets confirm LWT-Net’s superior performance, achieving mean Dice similarity coefficients of 95.96%, 97.15%, and 95.97%.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104153"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104158
Naeem Ullah , Andrés Manuel Chacón-Maldonado , Francisco Martínez-Álvarez , Ivanoe De Falco , Giovanna Sannino
Accurate phenological stage classification is crucial for addressing global challenges to food security posed by climate change, water scarcity, and land degradation. It enables precision agriculture by optimizing key interventions such as irrigation, fertilization, and pest control. While deep learning offers powerful tools, existing methods face four key limitations: reliance on narrow features and models, limited long-term forecasting capability, computational inefficiency, and opaque, unvalidated explanations. To overcome these limitations, this paper presents a deep learning framework for phenology classification, utilizing multi-source time series data from satellite imagery, meteorological stations, and field observations. The approach emphasizes temporal consistency, spatial adaptability, computational efficiency, and explainability. A feature engineering pipeline extracts temporal dynamics via lag features, rolling statistics, Fourier transforms and seasonal encodings. Feature selection combines incremental strategies with classical filter, wrapper, and embedded methods. Deep learning models across multiple paradigms-feedforward, recurrent, convolutional, and attention-based-are benchmarked under multi-horizon forecasting tasks. To reduce model complexity while preserving performance where possible, the framework employs knowledge distillation, transferring predictive knowledge from complex teacher models to compact and deployable student models. For model interpretability, a new Hybrid SHAP-Association Rule Explainability approach is proposed, integrating model-driven and data-driven explanations. Agreement between views is quantified using trust metrics: precision@k, coverage, and Jaccard similarity, with a retraining-based validation mechanism. Experiments on phenology data from Andalusia demonstrate high accuracy, strong generalizability, trustworthy explanations and resource-efficient phenology monitoring in agricultural systems.
{"title":"A novel knowledge distillation and hybrid explainability approach for phenology stage classification from multi-source time series","authors":"Naeem Ullah , Andrés Manuel Chacón-Maldonado , Francisco Martínez-Álvarez , Ivanoe De Falco , Giovanna Sannino","doi":"10.1016/j.inffus.2026.104158","DOIUrl":"10.1016/j.inffus.2026.104158","url":null,"abstract":"<div><div>Accurate phenological stage classification is crucial for addressing global challenges to food security posed by climate change, water scarcity, and land degradation. It enables precision agriculture by optimizing key interventions such as irrigation, fertilization, and pest control. While deep learning offers powerful tools, existing methods face four key limitations: reliance on narrow features and models, limited long-term forecasting capability, computational inefficiency, and opaque, unvalidated explanations. To overcome these limitations, this paper presents a deep learning framework for phenology classification, utilizing multi-source time series data from satellite imagery, meteorological stations, and field observations. The approach emphasizes temporal consistency, spatial adaptability, computational efficiency, and explainability. A feature engineering pipeline extracts temporal dynamics via lag features, rolling statistics, Fourier transforms and seasonal encodings. Feature selection combines incremental strategies with classical filter, wrapper, and embedded methods. Deep learning models across multiple paradigms-feedforward, recurrent, convolutional, and attention-based-are benchmarked under multi-horizon forecasting tasks. To reduce model complexity while preserving performance where possible, the framework employs knowledge distillation, transferring predictive knowledge from complex teacher models to compact and deployable student models. For model interpretability, a new Hybrid SHAP-Association Rule Explainability approach is proposed, integrating model-driven and data-driven explanations. Agreement between views is quantified using trust metrics: precision@k, coverage, and Jaccard similarity, with a retraining-based validation mechanism. Experiments on phenology data from Andalusia demonstrate high accuracy, strong generalizability, trustworthy explanations and resource-efficient phenology monitoring in agricultural systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104158"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing global population and the severity of environmental issues are driving the agriculture sector to adopt innovative technological advances for sustainable food production. Classical computing approaches frequently struggle with the volume and complexity of agricultural data when performing tasks such as crop yield prediction, disease detection, soil analysis, and weather forecasting. This Systematic Literature Review (SLR) provides an in-depth analysis of the evolving significance of quantum computing in smart agriculture. Quantum algorithms have the potential to reduce computational complexity and create novel data representation methods for high-dimensional challenges by leveraging quantum mechanics principles such as superposition and entanglement. This paper employs a structured research methodology based on eight specific research questions to comprehensively investigate over 100 peer-reviewed studies on quantum computing and smart agriculture published between 2012 and 2025. It demonstrates the effectiveness of Quantum Machine Learning (QML), quantum optimization, and hybrid quantum-classical models in various agricultural applications. The survey examines real-world implementations and compares existing quantum initiatives to classical benchmarks for the classification and prediction tasks. The presented work identifies challenges and limitations of current quantum approaches. The paper outlines directions for future work, including the accessibility of quantum hardware and the development of domain-specific algorithms. To the best of our knowledge, this is the first research question-driven SLR that provides an in-depth analysis of how quantum computing can be applied in agricultural applications.
{"title":"Fusion of quantum computing with smart agriculture: A systematic review of methods, implementation, applications, and challenges","authors":"Sumit Kumar , Shashank Sheshar Singh , Gourav Bathla , Swati Sharma , Manisha Panjeta","doi":"10.1016/j.inffus.2026.104159","DOIUrl":"10.1016/j.inffus.2026.104159","url":null,"abstract":"<div><div>The growing global population and the severity of environmental issues are driving the agriculture sector to adopt innovative technological advances for sustainable food production. Classical computing approaches frequently struggle with the volume and complexity of agricultural data when performing tasks such as crop yield prediction, disease detection, soil analysis, and weather forecasting. This Systematic Literature Review (SLR) provides an in-depth analysis of the evolving significance of quantum computing in smart agriculture. Quantum algorithms have the potential to reduce computational complexity and create novel data representation methods for high-dimensional challenges by leveraging quantum mechanics principles such as superposition and entanglement. This paper employs a structured research methodology based on eight specific research questions to comprehensively investigate over 100 peer-reviewed studies on quantum computing and smart agriculture published between 2012 and 2025. It demonstrates the effectiveness of Quantum Machine Learning (QML), quantum optimization, and hybrid quantum-classical models in various agricultural applications. The survey examines real-world implementations and compares existing quantum initiatives to classical benchmarks for the classification and prediction tasks. The presented work identifies challenges and limitations of current quantum approaches. The paper outlines directions for future work, including the accessibility of quantum hardware and the development of domain-specific algorithms. To the best of our knowledge, this is the first research question-driven SLR that provides an in-depth analysis of how quantum computing can be applied in agricultural applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104159"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104156
Fangming Zhong , Xinyu He , Haiquan Yu , Xiu Liu , Suhua Zhang
Cross-modal matching with noisy correspondence has drawn considerable interest recently, due to the mismatched data imposed inevitably when collecting data from the Internet. Training on such noisy data often leads to severe performance degradation, as conventional methods tend to overfit rapidly to wrongly mismatched pairs. Most of the existing methods focus on predicting more reliable soft correspondence, generating higher weights for the pairs that are more likely to be correct. However, there still remain two limitations: (1) they ignore the informative signals embedded in the negative pairs, and (2) the instability of existing methods due to their sensitivity to the noise ratio. To address these issues, we explicitly take the negatives into account and propose a stable and noise-resistant complementary learning method, named Dual Contrastive Learning (DCL), for cross-modal matching with noisy correspondence. DCL leverages both positive pairs and negative pairs to improve the robustness. With the complementary contrastive learning, the negative pairs also contribute positively to the model optimization. Specifically, to fully explore the potential of mismatched data, we first partition the training data into clean and noisy subsets based on the memorization effect of deep neural networks. Then, we employ vanilla contrastive learning for positive matched pairs in the clean subset. As for negative pairs including the noisy subsets, complementary contrastive learning is adopted. In such doing, whatever the level of noise ratio is, the proposed method is robust to balance the positive information and negative information. Extensive experiments indicate that DCL significantly outperforms the state-of-the-art methods and exhibits remarkable stability with an extremely low variance of R@1. Specifically, the R@1 scores of our DCL are 7% and 9.1% higher than NPC on image-to-text and text-to-image, respectively. The source code is released at https://github.com/hxy2969/dcl.
{"title":"Negative can be positive: A stable and noise-resistant complementary contrastive learning for cross-modal matching","authors":"Fangming Zhong , Xinyu He , Haiquan Yu , Xiu Liu , Suhua Zhang","doi":"10.1016/j.inffus.2026.104156","DOIUrl":"10.1016/j.inffus.2026.104156","url":null,"abstract":"<div><div>Cross-modal matching with noisy correspondence has drawn considerable interest recently, due to the mismatched data imposed inevitably when collecting data from the Internet. Training on such noisy data often leads to severe performance degradation, as conventional methods tend to overfit rapidly to wrongly mismatched pairs. Most of the existing methods focus on predicting more reliable soft correspondence, generating higher weights for the pairs that are more likely to be correct. However, there still remain two limitations: (1) they ignore the informative signals embedded in the negative pairs, and (2) the instability of existing methods due to their sensitivity to the noise ratio. To address these issues, we explicitly take the negatives into account and propose a stable and noise-resistant complementary learning method, named Dual Contrastive Learning (DCL), for cross-modal matching with noisy correspondence. DCL leverages both positive pairs and negative pairs to improve the robustness. With the complementary contrastive learning, the negative pairs also contribute positively to the model optimization. Specifically, to fully explore the potential of mismatched data, we first partition the training data into clean and noisy subsets based on the memorization effect of deep neural networks. Then, we employ vanilla contrastive learning for positive matched pairs in the clean subset. As for negative pairs including the noisy subsets, complementary contrastive learning is adopted. In such doing, whatever the level of noise ratio is, the proposed method is robust to balance the positive information and negative information. Extensive experiments indicate that DCL significantly outperforms the state-of-the-art methods and exhibits remarkable stability with an extremely low variance of R@1. Specifically, the R@1 scores of our DCL are 7% and 9.1% higher than NPC on image-to-text and text-to-image, respectively. The source code is released at <span><span>https://github.com/hxy2969/dcl</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104156"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The widespread availability of the Internet and the growing use of smart devices have fueled the rapid expansion of multimodal (image-text) sentiment analysis (MSA), a burgeoning research field. This growth is driven by the massive volume of image-text data generated by these technologies. However, MSA faces significant challenges, notably the misalignment between images and text, where an image may carry multiple interpretations or contradict its paired text. In addition, short textual content often lacks sufficient context, complicating sentiment prediction. These issues are particularly acute in low-resource languages, where annotated image-text corpora are scarce, and Vision-Language Models (VLMs) and Large Language Models (LLMs) exhibit limited performance. This research introduces MulMoSenT, a multimodal image-text sentiment analysis system tailored to tackle these challenges for low-resource languages. The development of MulMoSenT unfolds across four key phases: corpus development, baseline model evaluation and selection, hyperparameter adaptation, and model fine-tuning and inference. The proposed MulMoSenT model achieves a peak accuracy of 84.90%, surpassing all baseline models. Delivers a 37. 83% improvement over VLMs, a 35.28% gain over image-only models, and a 0.71% enhancement over text-only models. Both the dataset and the solution are publicly accessible at: https://github.com/sadia-afroze/MulMoSenT.
{"title":"MulMoSenT: Multimodal sentiment analysis for a low-resource language using textual-visual cross-attention and fusion","authors":"Sadia Afroze , Md. Rajib Hossain , Mohammed Moshiul Hoque , Nazmul Siddique","doi":"10.1016/j.inffus.2026.104129","DOIUrl":"10.1016/j.inffus.2026.104129","url":null,"abstract":"<div><div>The widespread availability of the Internet and the growing use of smart devices have fueled the rapid expansion of multimodal (image-text) sentiment analysis (MSA), a burgeoning research field. This growth is driven by the massive volume of image-text data generated by these technologies. However, MSA faces significant challenges, notably the misalignment between images and text, where an image may carry multiple interpretations or contradict its paired text. In addition, short textual content often lacks sufficient context, complicating sentiment prediction. These issues are particularly acute in low-resource languages, where annotated image-text corpora are scarce, and Vision-Language Models (VLMs) and Large Language Models (LLMs) exhibit limited performance. This research introduces <strong>MulMoSenT</strong>, a multimodal image-text sentiment analysis system tailored to tackle these challenges for low-resource languages. The development of <strong>MulMoSenT</strong> unfolds across four key phases: corpus development, baseline model evaluation and selection, hyperparameter adaptation, and model fine-tuning and inference. The proposed <strong>MulMoSenT</strong> model achieves a peak accuracy of 84.90%, surpassing all baseline models. Delivers a 37. 83% improvement over VLMs, a 35.28% gain over image-only models, and a 0.71% enhancement over text-only models. Both the dataset and the solution are publicly accessible at: <span><span>https://github.com/sadia-afroze/MulMoSenT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104129"},"PeriodicalIF":15.5,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.inffus.2026.104151
Rui Hua , Zhaoyu Huang , Jinhao Lu , Yakun Li , Na Zhao
Traditional game tutorials often fail to deliver real-time contextual guidance, providing static instructions disconnected from dynamic gameplay states. This limitation stems from their inability to interpret evolving game environments and generate high-quality decisions during live player interactions. We present ExInCOACH, a hybrid framework that synergizes exploratory reinforcement learning (RL) with interactive large language models (LLMs) to enable state-aware adaptive tutoring. Our framework first employs deep RL to discover strategic patterns via self-play, constructing a Q-function. During player onboarding, LLMs map the Q-values of currently legal actions and their usage conditions into natural language rule explanations and strategic advice by analyzing live game states and player decisions.
Evaluations in Dou Di Zhu (a turn-based card game) reveal that learners using ExInCOACH experienced intuitive strategy internalization-all participants reported grasping advanced tactics faster than through rule-based tutorials, while most players highly valued the real-time contextual feedback. A comparative study demonstrated that players trained with ExInCOACH achieved a 70% win rate (14 wins/20 games) against those onboarded via traditional methods, as they benefited from adaptive guidance that evolved with their skill progression. To further validate the framework’s generalizability, evaluations were also conducted in StarCraft II, a high-complexity real-time strategy (RTS) game. In 2v2 cooperative battles, teams trained with ExInCOACH achieved a 66.7% win rate against teams assisted by Vision LLMs (VLLMs) and an impressive 100% win rate against teams relying on traditional static game wikis for learning. Cognitive load assessments indicated that ExInCOACH significantly reduced players- mental burden and frustration in complex scenarios involving real-time decision-making and multi-unit collaboration, while also outperforming traditional methods in information absorption efficiency and tactical adaptability. This work proposes a game tutorial design paradigm based on RL model exploration & LLM rule interpretation, making AI-generated strategies accessible through natural language interaction tailored to individual learning contexts.
{"title":"ExInCOACH: Strategic exploration meets interactive tutoring for context-aware game onboarding","authors":"Rui Hua , Zhaoyu Huang , Jinhao Lu , Yakun Li , Na Zhao","doi":"10.1016/j.inffus.2026.104151","DOIUrl":"10.1016/j.inffus.2026.104151","url":null,"abstract":"<div><div>Traditional game tutorials often fail to deliver real-time contextual guidance, providing static instructions disconnected from dynamic gameplay states. This limitation stems from their inability to interpret evolving game environments and generate high-quality decisions during live player interactions. We present ExInCOACH, a hybrid framework that synergizes exploratory reinforcement learning (RL) with interactive large language models (LLMs) to enable state-aware adaptive tutoring. Our framework first employs deep RL to discover strategic patterns via self-play, constructing a Q-function. During player onboarding, LLMs map the Q-values of currently legal actions and their usage conditions into natural language rule explanations and strategic advice by analyzing live game states and player decisions.</div><div>Evaluations in Dou Di Zhu (a turn-based card game) reveal that learners using ExInCOACH experienced intuitive strategy internalization-all participants reported grasping advanced tactics faster than through rule-based tutorials, while most players highly valued the real-time contextual feedback. A comparative study demonstrated that players trained with ExInCOACH achieved a 70% win rate (14 wins/20 games) against those onboarded via traditional methods, as they benefited from adaptive guidance that evolved with their skill progression. To further validate the framework’s generalizability, evaluations were also conducted in StarCraft II, a high-complexity real-time strategy (RTS) game. In 2v2 cooperative battles, teams trained with ExInCOACH achieved a 66.7% win rate against teams assisted by Vision LLMs (VLLMs) and an impressive 100% win rate against teams relying on traditional static game wikis for learning. Cognitive load assessments indicated that ExInCOACH significantly reduced players- mental burden and frustration in complex scenarios involving real-time decision-making and multi-unit collaboration, while also outperforming traditional methods in information absorption efficiency and tactical adaptability. This work proposes a game tutorial design paradigm based on RL model exploration & LLM rule interpretation, making AI-generated strategies accessible through natural language interaction tailored to individual learning contexts.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104151"},"PeriodicalIF":15.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.inffus.2026.104149
Weixuan Ma , Yamin Li , Chujin Liu , Hao Zhang , Jie Li , Kansong Chen , Weixuan Gao
With the rapid development of technologies like virtual reality (VR), autonomous driving, and digital twins, the demand for high-precision and realistic multimodal 3D reconstruction has surged. This technology has become a core research focus in computer vision and graphics due to its ability to integrate multi-source data, such as 2D images and point clouds. However, existing methods face challenges such as geometric inconsistency in single-view reconstruction, poor point cloud-to-mesh conversion, and insufficient multimodal feature fusion, limiting their practical application. To address these issues, this paper proposes GeoCraft, a multimodal 3D reconstruction method that generates high-precision 3D models from 2D images through three collaborative stages: Diff2DPoint, Point2DMesh, and Vision3DGen. Specifically, Diff2DPoint generates an initial point cloud with geometric alignment using a diffusion model and projection feature fusion; Point2DMesh converts the point cloud into a high-quality mesh using an autoregressive decoder-only Transformer and Direct Preference Optimization (DPO); Vision3DGen creates high-fidelity 3D objects through multimodal feature alignment. Experiments on the Google Scanned Objects (GSO) and Pix3D datasets show that GeoCraft excels in key metrics. On the GSO dataset, its CMMD is 2.810 and FIDCLIP is 26.420; on Pix3D, CMMD is 3.020 and FIDCLIP is 27.030. GeoCraft significantly outperforms existing 3D reconstruction methods and also demonstrates advantages in computational efficiency, effectively solving key challenges in 3D reconstruction.The code is available at https://github.com/weixuanma/GeoCraft.
{"title":"GeoCraft: A Diffusion Model-based 3D Reconstruction Method driven by image and point cloud fusion","authors":"Weixuan Ma , Yamin Li , Chujin Liu , Hao Zhang , Jie Li , Kansong Chen , Weixuan Gao","doi":"10.1016/j.inffus.2026.104149","DOIUrl":"10.1016/j.inffus.2026.104149","url":null,"abstract":"<div><div>With the rapid development of technologies like virtual reality (VR), autonomous driving, and digital twins, the demand for high-precision and realistic multimodal 3D reconstruction has surged. This technology has become a core research focus in computer vision and graphics due to its ability to integrate multi-source data, such as 2D images and point clouds. However, existing methods face challenges such as geometric inconsistency in single-view reconstruction, poor point cloud-to-mesh conversion, and insufficient multimodal feature fusion, limiting their practical application. To address these issues, this paper proposes GeoCraft, a multimodal 3D reconstruction method that generates high-precision 3D models from 2D images through three collaborative stages: Diff2DPoint, Point2DMesh, and Vision3DGen. Specifically, Diff2DPoint generates an initial point cloud with geometric alignment using a diffusion model and projection feature fusion; Point2DMesh converts the point cloud into a high-quality mesh using an autoregressive decoder-only Transformer and Direct Preference Optimization (DPO); Vision3DGen creates high-fidelity 3D objects through multimodal feature alignment. Experiments on the Google Scanned Objects (GSO) and Pix3D datasets show that GeoCraft excels in key metrics. On the GSO dataset, its CMMD is 2.810 and FID<sub>CLIP</sub> is 26.420; on Pix3D, CMMD is 3.020 and FID<sub>CLIP</sub> is 27.030. GeoCraft significantly outperforms existing 3D reconstruction methods and also demonstrates advantages in computational efficiency, effectively solving key challenges in 3D reconstruction.The code is available at <span><span>https://github.com/weixuanma/GeoCraft</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104149"},"PeriodicalIF":15.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.inffus.2026.104148
Jungwon Seo , Ferhat Ozgur Catak , Chunming Rong , Kibeom Hong , Minhoe Kim
Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but suffers from client drift in highly heterogeneous data settings. Many existing approaches mitigate drift by providing clients with common reference points, typically derived from past information, to align objectives or gradient directions. However, under severe partial participation, such history-dependent references may become unreliable, as the set of client data distributions participating in each round can vary drastically. To overcome this limitation, we propose a method that mitigates client drift without relying on past information by constraining the update space through Gradient Centralization (GC). Specifically, we introduce Local GC and Global GC, which apply GC at the local and global update stages, respectively, and further present GC-Fed, a hybrid formulation that generalizes both. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that GC-Fed effectively alleviates client drift and achieves up to 20 % accuracy improvement under data heterogeneous and partial participation conditions.
{"title":"GC-Fed: Gradient centralized federated learning with partial client participation","authors":"Jungwon Seo , Ferhat Ozgur Catak , Chunming Rong , Kibeom Hong , Minhoe Kim","doi":"10.1016/j.inffus.2026.104148","DOIUrl":"10.1016/j.inffus.2026.104148","url":null,"abstract":"<div><div>Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but suffers from client drift in highly heterogeneous data settings. Many existing approaches mitigate drift by providing clients with common reference points, typically derived from past information, to align objectives or gradient directions. However, under severe partial participation, such history-dependent references may become unreliable, as the set of client data distributions participating in each round can vary drastically. To overcome this limitation, we propose a method that mitigates client drift without relying on past information by constraining the update space through Gradient Centralization (GC). Specifically, we introduce <span>Local GC</span> and <span>Global GC</span>, which apply GC at the local and global update stages, respectively, and further present <span>GC-Fed</span>, a hybrid formulation that generalizes both. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that <span>GC-Fed</span> effectively alleviates client drift and achieves up to 20 % accuracy improvement under data heterogeneous and partial participation conditions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104148"},"PeriodicalIF":15.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}