Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104153
Huaxiang Liu , Wei Sun , Youyao Fu , Shiqing Zhang , Jie Jin , Jiangxiong Fang , Binliang Wang
Accurate liver segmentation in computed tomography (CT) scans is crucial for the diagnosis of hepatocellular carcinoma and surgical planning; however, manual delineation is laborious and prone to operator variability. Existing deep learning methods frequently sacrifice precise boundary delineation when expanding receptive fields or fail to leverage frequency-domain cues that encode global shape, while conventional attention mechanisms are less effective in processing low-contrast images. To address these challenges, we introduce LWT-Net, a novel network guided by a trainable lifting wavelet transform, incorporating a frequency-split histogram attention mechanism to enhance liver segmentation. LWT-Net incorporates a trainable lifting wavelet transform within an encoder-decoder framework to hierarchically decompose features into low-frequency components that capture global structure and high-frequency bands that preserve edge and texture details. A complementary inverse lifting stage reconstructs high-resolution features while maintaining spatial consistency. The frequency-spatial fusion module, driven by a histogram-based attention mechanism, performs histogram-guided feature reorganization across global and local bins, while employing self-attention to capture long-range dependencies and prioritize anatomically significant regions. Comprehensive evaluations on the LiTS2017, WORD, and FLARE22 datasets confirm LWT-Net’s superior performance, achieving mean Dice similarity coefficients of 95.96%, 97.15%, and 95.97%.
{"title":"Lifting wavelet transform-guided network with histogram attention for liver segmentation in CT scans","authors":"Huaxiang Liu , Wei Sun , Youyao Fu , Shiqing Zhang , Jie Jin , Jiangxiong Fang , Binliang Wang","doi":"10.1016/j.inffus.2026.104153","DOIUrl":"10.1016/j.inffus.2026.104153","url":null,"abstract":"<div><div>Accurate liver segmentation in computed tomography (CT) scans is crucial for the diagnosis of hepatocellular carcinoma and surgical planning; however, manual delineation is laborious and prone to operator variability. Existing deep learning methods frequently sacrifice precise boundary delineation when expanding receptive fields or fail to leverage frequency-domain cues that encode global shape, while conventional attention mechanisms are less effective in processing low-contrast images. To address these challenges, we introduce LWT-Net, a novel network guided by a trainable lifting wavelet transform, incorporating a frequency-split histogram attention mechanism to enhance liver segmentation. LWT-Net incorporates a trainable lifting wavelet transform within an encoder-decoder framework to hierarchically decompose features into low-frequency components that capture global structure and high-frequency bands that preserve edge and texture details. A complementary inverse lifting stage reconstructs high-resolution features while maintaining spatial consistency. The frequency-spatial fusion module, driven by a histogram-based attention mechanism, performs histogram-guided feature reorganization across global and local bins, while employing self-attention to capture long-range dependencies and prioritize anatomically significant regions. Comprehensive evaluations on the LiTS2017, WORD, and FLARE22 datasets confirm LWT-Net’s superior performance, achieving mean Dice similarity coefficients of 95.96%, 97.15%, and 95.97%.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104153"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104158
Naeem Ullah , Andrés Manuel Chacón-Maldonado , Francisco Martínez-Álvarez , Ivanoe De Falco , Giovanna Sannino
Accurate phenological stage classification is crucial for addressing global challenges to food security posed by climate change, water scarcity, and land degradation. It enables precision agriculture by optimizing key interventions such as irrigation, fertilization, and pest control. While deep learning offers powerful tools, existing methods face four key limitations: reliance on narrow features and models, limited long-term forecasting capability, computational inefficiency, and opaque, unvalidated explanations. To overcome these limitations, this paper presents a deep learning framework for phenology classification, utilizing multi-source time series data from satellite imagery, meteorological stations, and field observations. The approach emphasizes temporal consistency, spatial adaptability, computational efficiency, and explainability. A feature engineering pipeline extracts temporal dynamics via lag features, rolling statistics, Fourier transforms and seasonal encodings. Feature selection combines incremental strategies with classical filter, wrapper, and embedded methods. Deep learning models across multiple paradigms-feedforward, recurrent, convolutional, and attention-based-are benchmarked under multi-horizon forecasting tasks. To reduce model complexity while preserving performance where possible, the framework employs knowledge distillation, transferring predictive knowledge from complex teacher models to compact and deployable student models. For model interpretability, a new Hybrid SHAP-Association Rule Explainability approach is proposed, integrating model-driven and data-driven explanations. Agreement between views is quantified using trust metrics: precision@k, coverage, and Jaccard similarity, with a retraining-based validation mechanism. Experiments on phenology data from Andalusia demonstrate high accuracy, strong generalizability, trustworthy explanations and resource-efficient phenology monitoring in agricultural systems.
{"title":"A novel knowledge distillation and hybrid explainability approach for phenology stage classification from multi-source time series","authors":"Naeem Ullah , Andrés Manuel Chacón-Maldonado , Francisco Martínez-Álvarez , Ivanoe De Falco , Giovanna Sannino","doi":"10.1016/j.inffus.2026.104158","DOIUrl":"10.1016/j.inffus.2026.104158","url":null,"abstract":"<div><div>Accurate phenological stage classification is crucial for addressing global challenges to food security posed by climate change, water scarcity, and land degradation. It enables precision agriculture by optimizing key interventions such as irrigation, fertilization, and pest control. While deep learning offers powerful tools, existing methods face four key limitations: reliance on narrow features and models, limited long-term forecasting capability, computational inefficiency, and opaque, unvalidated explanations. To overcome these limitations, this paper presents a deep learning framework for phenology classification, utilizing multi-source time series data from satellite imagery, meteorological stations, and field observations. The approach emphasizes temporal consistency, spatial adaptability, computational efficiency, and explainability. A feature engineering pipeline extracts temporal dynamics via lag features, rolling statistics, Fourier transforms and seasonal encodings. Feature selection combines incremental strategies with classical filter, wrapper, and embedded methods. Deep learning models across multiple paradigms-feedforward, recurrent, convolutional, and attention-based-are benchmarked under multi-horizon forecasting tasks. To reduce model complexity while preserving performance where possible, the framework employs knowledge distillation, transferring predictive knowledge from complex teacher models to compact and deployable student models. For model interpretability, a new Hybrid SHAP-Association Rule Explainability approach is proposed, integrating model-driven and data-driven explanations. Agreement between views is quantified using trust metrics: precision@k, coverage, and Jaccard similarity, with a retraining-based validation mechanism. Experiments on phenology data from Andalusia demonstrate high accuracy, strong generalizability, trustworthy explanations and resource-efficient phenology monitoring in agricultural systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104158"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing global population and the severity of environmental issues are driving the agriculture sector to adopt innovative technological advances for sustainable food production. Classical computing approaches frequently struggle with the volume and complexity of agricultural data when performing tasks such as crop yield prediction, disease detection, soil analysis, and weather forecasting. This Systematic Literature Review (SLR) provides an in-depth analysis of the evolving significance of quantum computing in smart agriculture. Quantum algorithms have the potential to reduce computational complexity and create novel data representation methods for high-dimensional challenges by leveraging quantum mechanics principles such as superposition and entanglement. This paper employs a structured research methodology based on eight specific research questions to comprehensively investigate over 100 peer-reviewed studies on quantum computing and smart agriculture published between 2012 and 2025. It demonstrates the effectiveness of Quantum Machine Learning (QML), quantum optimization, and hybrid quantum-classical models in various agricultural applications. The survey examines real-world implementations and compares existing quantum initiatives to classical benchmarks for the classification and prediction tasks. The presented work identifies challenges and limitations of current quantum approaches. The paper outlines directions for future work, including the accessibility of quantum hardware and the development of domain-specific algorithms. To the best of our knowledge, this is the first research question-driven SLR that provides an in-depth analysis of how quantum computing can be applied in agricultural applications.
{"title":"Fusion of quantum computing with smart agriculture: A systematic review of methods, implementation, applications, and challenges","authors":"Sumit Kumar , Shashank Sheshar Singh , Gourav Bathla , Swati Sharma , Manisha Panjeta","doi":"10.1016/j.inffus.2026.104159","DOIUrl":"10.1016/j.inffus.2026.104159","url":null,"abstract":"<div><div>The growing global population and the severity of environmental issues are driving the agriculture sector to adopt innovative technological advances for sustainable food production. Classical computing approaches frequently struggle with the volume and complexity of agricultural data when performing tasks such as crop yield prediction, disease detection, soil analysis, and weather forecasting. This Systematic Literature Review (SLR) provides an in-depth analysis of the evolving significance of quantum computing in smart agriculture. Quantum algorithms have the potential to reduce computational complexity and create novel data representation methods for high-dimensional challenges by leveraging quantum mechanics principles such as superposition and entanglement. This paper employs a structured research methodology based on eight specific research questions to comprehensively investigate over 100 peer-reviewed studies on quantum computing and smart agriculture published between 2012 and 2025. It demonstrates the effectiveness of Quantum Machine Learning (QML), quantum optimization, and hybrid quantum-classical models in various agricultural applications. The survey examines real-world implementations and compares existing quantum initiatives to classical benchmarks for the classification and prediction tasks. The presented work identifies challenges and limitations of current quantum approaches. The paper outlines directions for future work, including the accessibility of quantum hardware and the development of domain-specific algorithms. To the best of our knowledge, this is the first research question-driven SLR that provides an in-depth analysis of how quantum computing can be applied in agricultural applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104159"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.inffus.2026.104156
Fangming Zhong , Xinyu He , Haiquan Yu , Xiu Liu , Suhua Zhang
Cross-modal matching with noisy correspondence has drawn considerable interest recently, due to the mismatched data imposed inevitably when collecting data from the Internet. Training on such noisy data often leads to severe performance degradation, as conventional methods tend to overfit rapidly to wrongly mismatched pairs. Most of the existing methods focus on predicting more reliable soft correspondence, generating higher weights for the pairs that are more likely to be correct. However, there still remain two limitations: (1) they ignore the informative signals embedded in the negative pairs, and (2) the instability of existing methods due to their sensitivity to the noise ratio. To address these issues, we explicitly take the negatives into account and propose a stable and noise-resistant complementary learning method, named Dual Contrastive Learning (DCL), for cross-modal matching with noisy correspondence. DCL leverages both positive pairs and negative pairs to improve the robustness. With the complementary contrastive learning, the negative pairs also contribute positively to the model optimization. Specifically, to fully explore the potential of mismatched data, we first partition the training data into clean and noisy subsets based on the memorization effect of deep neural networks. Then, we employ vanilla contrastive learning for positive matched pairs in the clean subset. As for negative pairs including the noisy subsets, complementary contrastive learning is adopted. In such doing, whatever the level of noise ratio is, the proposed method is robust to balance the positive information and negative information. Extensive experiments indicate that DCL significantly outperforms the state-of-the-art methods and exhibits remarkable stability with an extremely low variance of R@1. Specifically, the R@1 scores of our DCL are 7% and 9.1% higher than NPC on image-to-text and text-to-image, respectively. The source code is released at https://github.com/hxy2969/dcl.
{"title":"Negative can be positive: A stable and noise-resistant complementary contrastive learning for cross-modal matching","authors":"Fangming Zhong , Xinyu He , Haiquan Yu , Xiu Liu , Suhua Zhang","doi":"10.1016/j.inffus.2026.104156","DOIUrl":"10.1016/j.inffus.2026.104156","url":null,"abstract":"<div><div>Cross-modal matching with noisy correspondence has drawn considerable interest recently, due to the mismatched data imposed inevitably when collecting data from the Internet. Training on such noisy data often leads to severe performance degradation, as conventional methods tend to overfit rapidly to wrongly mismatched pairs. Most of the existing methods focus on predicting more reliable soft correspondence, generating higher weights for the pairs that are more likely to be correct. However, there still remain two limitations: (1) they ignore the informative signals embedded in the negative pairs, and (2) the instability of existing methods due to their sensitivity to the noise ratio. To address these issues, we explicitly take the negatives into account and propose a stable and noise-resistant complementary learning method, named Dual Contrastive Learning (DCL), for cross-modal matching with noisy correspondence. DCL leverages both positive pairs and negative pairs to improve the robustness. With the complementary contrastive learning, the negative pairs also contribute positively to the model optimization. Specifically, to fully explore the potential of mismatched data, we first partition the training data into clean and noisy subsets based on the memorization effect of deep neural networks. Then, we employ vanilla contrastive learning for positive matched pairs in the clean subset. As for negative pairs including the noisy subsets, complementary contrastive learning is adopted. In such doing, whatever the level of noise ratio is, the proposed method is robust to balance the positive information and negative information. Extensive experiments indicate that DCL significantly outperforms the state-of-the-art methods and exhibits remarkable stability with an extremely low variance of R@1. Specifically, the R@1 scores of our DCL are 7% and 9.1% higher than NPC on image-to-text and text-to-image, respectively. The source code is released at <span><span>https://github.com/hxy2969/dcl</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104156"},"PeriodicalIF":15.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The widespread availability of the Internet and the growing use of smart devices have fueled the rapid expansion of multimodal (image-text) sentiment analysis (MSA), a burgeoning research field. This growth is driven by the massive volume of image-text data generated by these technologies. However, MSA faces significant challenges, notably the misalignment between images and text, where an image may carry multiple interpretations or contradict its paired text. In addition, short textual content often lacks sufficient context, complicating sentiment prediction. These issues are particularly acute in low-resource languages, where annotated image-text corpora are scarce, and Vision-Language Models (VLMs) and Large Language Models (LLMs) exhibit limited performance. This research introduces MulMoSenT, a multimodal image-text sentiment analysis system tailored to tackle these challenges for low-resource languages. The development of MulMoSenT unfolds across four key phases: corpus development, baseline model evaluation and selection, hyperparameter adaptation, and model fine-tuning and inference. The proposed MulMoSenT model achieves a peak accuracy of 84.90%, surpassing all baseline models. Delivers a 37. 83% improvement over VLMs, a 35.28% gain over image-only models, and a 0.71% enhancement over text-only models. Both the dataset and the solution are publicly accessible at: https://github.com/sadia-afroze/MulMoSenT.
{"title":"MulMoSenT: Multimodal sentiment analysis for a low-resource language using textual-visual cross-attention and fusion","authors":"Sadia Afroze , Md. Rajib Hossain , Mohammed Moshiul Hoque , Nazmul Siddique","doi":"10.1016/j.inffus.2026.104129","DOIUrl":"10.1016/j.inffus.2026.104129","url":null,"abstract":"<div><div>The widespread availability of the Internet and the growing use of smart devices have fueled the rapid expansion of multimodal (image-text) sentiment analysis (MSA), a burgeoning research field. This growth is driven by the massive volume of image-text data generated by these technologies. However, MSA faces significant challenges, notably the misalignment between images and text, where an image may carry multiple interpretations or contradict its paired text. In addition, short textual content often lacks sufficient context, complicating sentiment prediction. These issues are particularly acute in low-resource languages, where annotated image-text corpora are scarce, and Vision-Language Models (VLMs) and Large Language Models (LLMs) exhibit limited performance. This research introduces <strong>MulMoSenT</strong>, a multimodal image-text sentiment analysis system tailored to tackle these challenges for low-resource languages. The development of <strong>MulMoSenT</strong> unfolds across four key phases: corpus development, baseline model evaluation and selection, hyperparameter adaptation, and model fine-tuning and inference. The proposed <strong>MulMoSenT</strong> model achieves a peak accuracy of 84.90%, surpassing all baseline models. Delivers a 37. 83% improvement over VLMs, a 35.28% gain over image-only models, and a 0.71% enhancement over text-only models. Both the dataset and the solution are publicly accessible at: <span><span>https://github.com/sadia-afroze/MulMoSenT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104129"},"PeriodicalIF":15.5,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.inffus.2026.104151
Rui Hua , Zhaoyu Huang , Jinhao Lu , Yakun Li , Na Zhao
Traditional game tutorials often fail to deliver real-time contextual guidance, providing static instructions disconnected from dynamic gameplay states. This limitation stems from their inability to interpret evolving game environments and generate high-quality decisions during live player interactions. We present ExInCOACH, a hybrid framework that synergizes exploratory reinforcement learning (RL) with interactive large language models (LLMs) to enable state-aware adaptive tutoring. Our framework first employs deep RL to discover strategic patterns via self-play, constructing a Q-function. During player onboarding, LLMs map the Q-values of currently legal actions and their usage conditions into natural language rule explanations and strategic advice by analyzing live game states and player decisions.
Evaluations in Dou Di Zhu (a turn-based card game) reveal that learners using ExInCOACH experienced intuitive strategy internalization-all participants reported grasping advanced tactics faster than through rule-based tutorials, while most players highly valued the real-time contextual feedback. A comparative study demonstrated that players trained with ExInCOACH achieved a 70% win rate (14 wins/20 games) against those onboarded via traditional methods, as they benefited from adaptive guidance that evolved with their skill progression. To further validate the framework’s generalizability, evaluations were also conducted in StarCraft II, a high-complexity real-time strategy (RTS) game. In 2v2 cooperative battles, teams trained with ExInCOACH achieved a 66.7% win rate against teams assisted by Vision LLMs (VLLMs) and an impressive 100% win rate against teams relying on traditional static game wikis for learning. Cognitive load assessments indicated that ExInCOACH significantly reduced players- mental burden and frustration in complex scenarios involving real-time decision-making and multi-unit collaboration, while also outperforming traditional methods in information absorption efficiency and tactical adaptability. This work proposes a game tutorial design paradigm based on RL model exploration & LLM rule interpretation, making AI-generated strategies accessible through natural language interaction tailored to individual learning contexts.
{"title":"ExInCOACH: Strategic exploration meets interactive tutoring for context-aware game onboarding","authors":"Rui Hua , Zhaoyu Huang , Jinhao Lu , Yakun Li , Na Zhao","doi":"10.1016/j.inffus.2026.104151","DOIUrl":"10.1016/j.inffus.2026.104151","url":null,"abstract":"<div><div>Traditional game tutorials often fail to deliver real-time contextual guidance, providing static instructions disconnected from dynamic gameplay states. This limitation stems from their inability to interpret evolving game environments and generate high-quality decisions during live player interactions. We present ExInCOACH, a hybrid framework that synergizes exploratory reinforcement learning (RL) with interactive large language models (LLMs) to enable state-aware adaptive tutoring. Our framework first employs deep RL to discover strategic patterns via self-play, constructing a Q-function. During player onboarding, LLMs map the Q-values of currently legal actions and their usage conditions into natural language rule explanations and strategic advice by analyzing live game states and player decisions.</div><div>Evaluations in Dou Di Zhu (a turn-based card game) reveal that learners using ExInCOACH experienced intuitive strategy internalization-all participants reported grasping advanced tactics faster than through rule-based tutorials, while most players highly valued the real-time contextual feedback. A comparative study demonstrated that players trained with ExInCOACH achieved a 70% win rate (14 wins/20 games) against those onboarded via traditional methods, as they benefited from adaptive guidance that evolved with their skill progression. To further validate the framework’s generalizability, evaluations were also conducted in StarCraft II, a high-complexity real-time strategy (RTS) game. In 2v2 cooperative battles, teams trained with ExInCOACH achieved a 66.7% win rate against teams assisted by Vision LLMs (VLLMs) and an impressive 100% win rate against teams relying on traditional static game wikis for learning. Cognitive load assessments indicated that ExInCOACH significantly reduced players- mental burden and frustration in complex scenarios involving real-time decision-making and multi-unit collaboration, while also outperforming traditional methods in information absorption efficiency and tactical adaptability. This work proposes a game tutorial design paradigm based on RL model exploration & LLM rule interpretation, making AI-generated strategies accessible through natural language interaction tailored to individual learning contexts.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104151"},"PeriodicalIF":15.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.inffus.2026.104149
Weixuan Ma , Yamin Li , Chujin Liu , Hao Zhang , Jie Li , Kansong Chen , Weixuan Gao
With the rapid development of technologies like virtual reality (VR), autonomous driving, and digital twins, the demand for high-precision and realistic multimodal 3D reconstruction has surged. This technology has become a core research focus in computer vision and graphics due to its ability to integrate multi-source data, such as 2D images and point clouds. However, existing methods face challenges such as geometric inconsistency in single-view reconstruction, poor point cloud-to-mesh conversion, and insufficient multimodal feature fusion, limiting their practical application. To address these issues, this paper proposes GeoCraft, a multimodal 3D reconstruction method that generates high-precision 3D models from 2D images through three collaborative stages: Diff2DPoint, Point2DMesh, and Vision3DGen. Specifically, Diff2DPoint generates an initial point cloud with geometric alignment using a diffusion model and projection feature fusion; Point2DMesh converts the point cloud into a high-quality mesh using an autoregressive decoder-only Transformer and Direct Preference Optimization (DPO); Vision3DGen creates high-fidelity 3D objects through multimodal feature alignment. Experiments on the Google Scanned Objects (GSO) and Pix3D datasets show that GeoCraft excels in key metrics. On the GSO dataset, its CMMD is 2.810 and FIDCLIP is 26.420; on Pix3D, CMMD is 3.020 and FIDCLIP is 27.030. GeoCraft significantly outperforms existing 3D reconstruction methods and also demonstrates advantages in computational efficiency, effectively solving key challenges in 3D reconstruction.The code is available at https://github.com/weixuanma/GeoCraft.
{"title":"GeoCraft: A Diffusion Model-based 3D Reconstruction Method driven by image and point cloud fusion","authors":"Weixuan Ma , Yamin Li , Chujin Liu , Hao Zhang , Jie Li , Kansong Chen , Weixuan Gao","doi":"10.1016/j.inffus.2026.104149","DOIUrl":"10.1016/j.inffus.2026.104149","url":null,"abstract":"<div><div>With the rapid development of technologies like virtual reality (VR), autonomous driving, and digital twins, the demand for high-precision and realistic multimodal 3D reconstruction has surged. This technology has become a core research focus in computer vision and graphics due to its ability to integrate multi-source data, such as 2D images and point clouds. However, existing methods face challenges such as geometric inconsistency in single-view reconstruction, poor point cloud-to-mesh conversion, and insufficient multimodal feature fusion, limiting their practical application. To address these issues, this paper proposes GeoCraft, a multimodal 3D reconstruction method that generates high-precision 3D models from 2D images through three collaborative stages: Diff2DPoint, Point2DMesh, and Vision3DGen. Specifically, Diff2DPoint generates an initial point cloud with geometric alignment using a diffusion model and projection feature fusion; Point2DMesh converts the point cloud into a high-quality mesh using an autoregressive decoder-only Transformer and Direct Preference Optimization (DPO); Vision3DGen creates high-fidelity 3D objects through multimodal feature alignment. Experiments on the Google Scanned Objects (GSO) and Pix3D datasets show that GeoCraft excels in key metrics. On the GSO dataset, its CMMD is 2.810 and FID<sub>CLIP</sub> is 26.420; on Pix3D, CMMD is 3.020 and FID<sub>CLIP</sub> is 27.030. GeoCraft significantly outperforms existing 3D reconstruction methods and also demonstrates advantages in computational efficiency, effectively solving key challenges in 3D reconstruction.The code is available at <span><span>https://github.com/weixuanma/GeoCraft</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104149"},"PeriodicalIF":15.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.inffus.2026.104148
Jungwon Seo , Ferhat Ozgur Catak , Chunming Rong , Kibeom Hong , Minhoe Kim
Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but suffers from client drift in highly heterogeneous data settings. Many existing approaches mitigate drift by providing clients with common reference points, typically derived from past information, to align objectives or gradient directions. However, under severe partial participation, such history-dependent references may become unreliable, as the set of client data distributions participating in each round can vary drastically. To overcome this limitation, we propose a method that mitigates client drift without relying on past information by constraining the update space through Gradient Centralization (GC). Specifically, we introduce Local GC and Global GC, which apply GC at the local and global update stages, respectively, and further present GC-Fed, a hybrid formulation that generalizes both. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that GC-Fed effectively alleviates client drift and achieves up to 20 % accuracy improvement under data heterogeneous and partial participation conditions.
{"title":"GC-Fed: Gradient centralized federated learning with partial client participation","authors":"Jungwon Seo , Ferhat Ozgur Catak , Chunming Rong , Kibeom Hong , Minhoe Kim","doi":"10.1016/j.inffus.2026.104148","DOIUrl":"10.1016/j.inffus.2026.104148","url":null,"abstract":"<div><div>Federated Learning (FL) enables privacy-preserving multi-source information fusion (MSIF) but suffers from client drift in highly heterogeneous data settings. Many existing approaches mitigate drift by providing clients with common reference points, typically derived from past information, to align objectives or gradient directions. However, under severe partial participation, such history-dependent references may become unreliable, as the set of client data distributions participating in each round can vary drastically. To overcome this limitation, we propose a method that mitigates client drift without relying on past information by constraining the update space through Gradient Centralization (GC). Specifically, we introduce <span>Local GC</span> and <span>Global GC</span>, which apply GC at the local and global update stages, respectively, and further present <span>GC-Fed</span>, a hybrid formulation that generalizes both. Theoretical analysis and extensive experiments on benchmark FL tasks demonstrate that <span>GC-Fed</span> effectively alleviates client drift and achieves up to 20 % accuracy improvement under data heterogeneous and partial participation conditions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"131 ","pages":"Article 104148"},"PeriodicalIF":15.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.inffus.2026.104145
Li Li , Jianbing Ma , Beiji Zou , Hao Xu , Shenghui Liao , Wenyi Xiong , Liqiang Zhi
Knee osteoarthritis (KOA) is a globally prevalent degenerative joint disorder. A central challenge in its automated diagnosis is the efficient fusion of multimodal MRI data. This fusion aims to enhance the accuracy and generalizability of clinical cartilage segmentation, while simultaneously minimizing healthcare resource consumption. Therefore, this study introduces dynamic confidence fuzzy control (DynCFC) within the symmetric unet architecture (SymUnet), referred to as SymUnet-DynCFC, which is designed to enhance the accuracy and robustness of cartilage segmentation. Firstly, the SymUnet architecture is developed, with separate inputs from T1W and T2W modalities to facilitate comprehensive segmentation evaluation. Secondly, the DynCFC mechanism is implemented to compute the optimal weighting for each modality, enabling the fusion and optimization of multimodal features. Finally, the performance of the proposed SymUnet-DynCFC method is evaluated on clinical datasets from a multi-campus hospital system. Experimental results show that SymUnet-DynCFC achieves better segmentation performance than the baselines, with mean Dice, IoU, and HD95 values of 87.96 %, 79.93 %, and 1.29, respectively. In particular, SymUnet-DynCFC exhibits improved robustness compared to the baseline methods. This may facilitate automated cartilage segmentation in clinical workflows and could support the assessment of moderate-to-severe KOA by detecting outlier metrics.
{"title":"SymUnet-DynCFC: Multimodal MRI fusion for robust cartilage segmentation and clinically confirmed moderate-to-severe KOA diagnosis","authors":"Li Li , Jianbing Ma , Beiji Zou , Hao Xu , Shenghui Liao , Wenyi Xiong , Liqiang Zhi","doi":"10.1016/j.inffus.2026.104145","DOIUrl":"10.1016/j.inffus.2026.104145","url":null,"abstract":"<div><div>Knee osteoarthritis (KOA) is a globally prevalent degenerative joint disorder. A central challenge in its automated diagnosis is the efficient fusion of multimodal MRI data. This fusion aims to enhance the accuracy and generalizability of clinical cartilage segmentation, while simultaneously minimizing healthcare resource consumption. Therefore, this study introduces dynamic confidence fuzzy control (DynCFC) within the symmetric unet architecture (SymUnet), referred to as SymUnet-DynCFC, which is designed to enhance the accuracy and robustness of cartilage segmentation. Firstly, the SymUnet architecture is developed, with separate inputs from T1W and T2W modalities to facilitate comprehensive segmentation evaluation. Secondly, the DynCFC mechanism is implemented to compute the optimal weighting for each modality, enabling the fusion and optimization of multimodal features. Finally, the performance of the proposed SymUnet-DynCFC method is evaluated on clinical datasets from a multi-campus hospital system. Experimental results show that SymUnet-DynCFC achieves better segmentation performance than the baselines, with mean Dice, IoU, and HD95 values of 87.96 %, 79.93 %, and 1.29, respectively. In particular, SymUnet-DynCFC exhibits improved robustness compared to the baseline methods. This may facilitate automated cartilage segmentation in clinical workflows and could support the assessment of moderate-to-severe KOA by detecting outlier metrics.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104145"},"PeriodicalIF":15.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.inffus.2025.104113
Emilio Porcu , Roy El Moukari , Laurent Najman , Francisco Herrera , Horst Simon
This manuscript provides a systemic and data-centric view of what we term essential data science, as a natural ecosystem with challenges and missions stemming from the fusion of data universe with its multiple combinations of the 5D complexities (data structure, domain, cardinality, causality, and ethics) with the phases of the data life cycle. Data agents perform tasks driven by specific goals. The data scientist is an abstract entity that comes from the logical organization of data agents with their actions. Data scientists face challenges that are defined according to the missions. We define specific discipline-induced data science, which in turn allows for the definition of pan-data science, a natural ecosystem that integrates specific disciplines with the essential data science. We semantically split the essential data science into computational, and foundational. By formalizing this ecosystemic view, we contribute a general-purpose, fusion-oriented architecture for integrating heterogeneous knowledge, agents, and workflows-relevant to a wide range of disciplines and high-impact applications.
{"title":"Data science: a natural ecosystem","authors":"Emilio Porcu , Roy El Moukari , Laurent Najman , Francisco Herrera , Horst Simon","doi":"10.1016/j.inffus.2025.104113","DOIUrl":"10.1016/j.inffus.2025.104113","url":null,"abstract":"<div><div>This manuscript provides a systemic and data-centric view of what we term <em>essential</em> data science, as a <em>natural</em> ecosystem with challenges and missions stemming from the fusion of data universe with its multiple combinations of the 5D complexities (data structure, domain, cardinality, causality, and ethics) with the phases of the data life cycle. Data agents perform tasks driven by specific <em>goals</em>. The data scientist is an abstract entity that comes from the logical organization of data agents with their actions. Data scientists face challenges that are defined according to the <em>missions</em>. We define specific discipline-induced data science, which in turn allows for the definition of <em>pan</em>-data science, a natural ecosystem that integrates specific disciplines with the essential data science. We semantically split the essential data science into computational, and foundational. By formalizing this ecosystemic view, we contribute a general-purpose, fusion-oriented architecture for integrating heterogeneous knowledge, agents, and workflows-relevant to a wide range of disciplines and high-impact applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"130 ","pages":"Article 104113"},"PeriodicalIF":15.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}