首页 > 最新文献

Information Fusion最新文献

英文 中文
Advances in DeepFake detection algorithms: Exploring fusion techniques in single and multi-modal approach
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102993
Ashish Kumar , Divya Singh , Rachna Jain , Deepak Kumar Jain , Chenquan Gan , Xudong Zhao
In recent years, generative artificial intelligence has gained momentum and created extremely realistic synthetic multimedia content that can spread misinformation and mislead society. Deepfake detection is a technique consisting of frameworks, algorithms and approaches to predict manipulated contents namely, image, audio and video. To this end, we have analyzed and explored various deepfake detection frameworks by categorizing them as single-modal or multi-modal approaches. To provide better understanding and clarity, single-modal approaches are further categorized as conventional and advanced techniques. Conventional techniques extract complementary handcrafted features and classify them using machine-learning-based algorithms. On the other hand, advanced techniques adopt deep learning and hybrid algorithms to detect deepfakes. Multi-modal techniques utilize a mixture of two or more modalities for feature extraction and fuse them to obtain the final classification scores. These techniques are also categorized either as deep learning or hybrid techniques. The complementary features, multiple modalities, and deep learning models are fused adaptively using score-level or feature-level fusion. The advantages, features, practical applications, and limitations under each category are highlighted to address the challenges and determine future trends to counter deepfakes. In addition, recommendations are also elaborated to evaluate the potential of artificial intelligence in deepfake detection for providing a safer and more reliable digital world.
{"title":"Advances in DeepFake detection algorithms: Exploring fusion techniques in single and multi-modal approach","authors":"Ashish Kumar ,&nbsp;Divya Singh ,&nbsp;Rachna Jain ,&nbsp;Deepak Kumar Jain ,&nbsp;Chenquan Gan ,&nbsp;Xudong Zhao","doi":"10.1016/j.inffus.2025.102993","DOIUrl":"10.1016/j.inffus.2025.102993","url":null,"abstract":"<div><div>In recent years, generative artificial intelligence has gained momentum and created extremely realistic synthetic multimedia content that can spread misinformation and mislead society. Deepfake detection is a technique consisting of frameworks, algorithms and approaches to predict manipulated contents namely, image, audio and video. To this end, we have analyzed and explored various deepfake detection frameworks by categorizing them as single-modal or multi-modal approaches. To provide better understanding and clarity, single-modal approaches are further categorized as conventional and advanced techniques. Conventional techniques extract complementary handcrafted features and classify them using machine-learning-based algorithms. On the other hand, advanced techniques adopt deep learning and hybrid algorithms to detect deepfakes. Multi-modal techniques utilize a mixture of two or more modalities for feature extraction and fuse them to obtain the final classification scores. These techniques are also categorized either as deep learning or hybrid techniques. The complementary features, multiple modalities, and deep learning models are fused adaptively using score-level or feature-level fusion. The advantages, features, practical applications, and limitations under each category are highlighted to address the challenges and determine future trends to counter deepfakes. In addition, recommendations are also elaborated to evaluate the potential of artificial intelligence in deepfake detection for providing a safer and more reliable digital world.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102993"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modality-perceptive harmonization network for visible-infrared person re-identification
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102979
Xutao Zuo , Jinjia Peng , Tianhang Cheng , Huibing Wang
Visible-infrared person re-identification (VI-ReID) remains a challenging task due to the inconsistencies in data distribution and semantic inconsistency between heterogeneous modalities. Some visible-infrared person re-identification methods that leverage auxiliary modalities have achieved significant progress. However, these methods merely apply pixel-level augmentation to the original images and neglect dynamic modeling of modality-shared information, limiting their ability to reconcile modality discrepancies and capture cross-modal semantic correspondences. To address this, this paper proposes a Modality-perceptive Harmonization Network (MHN) to achieve feature-level harmonization through leveraging the coherences between visible and infrared modalities. Specifically, to alleviate domain discrepancies, a Modality-Perceptive Aggregation Module (MAM) is proposed to explicitly capture cross-modality consistency between heterogeneous modalities , thereby facilitating the adaptive fusion process of a harmonious hybrid modality and the extraction of reliable modality-shared features. Moreover, the modality harmonization loss is proposed to adjust the distribution of the generated hybrid modality and align the feature distributions across modalities. To address the issue of semantic inconsistency, a Dimensional Refinement Module (DRM) is proposed to decouple semantic information along channel and spatial dimensions to further enhance intra-modality diversity. Simultaneously, the modality consistency loss is designed to strengthen identity-related coherence of heterogeneous modalities, further enhancing the inter-modality semantic consistency. Extensive experiments on the SYSU-MM01, RegDB and LLCM datasets demonstrate the effectiveness of our model and a series ablation studies further validate the significant contributions of each component of our method.
{"title":"Modality-perceptive harmonization network for visible-infrared person re-identification","authors":"Xutao Zuo ,&nbsp;Jinjia Peng ,&nbsp;Tianhang Cheng ,&nbsp;Huibing Wang","doi":"10.1016/j.inffus.2025.102979","DOIUrl":"10.1016/j.inffus.2025.102979","url":null,"abstract":"<div><div>Visible-infrared person re-identification (VI-ReID) remains a challenging task due to the inconsistencies in data distribution and semantic inconsistency between heterogeneous modalities. Some visible-infrared person re-identification methods that leverage auxiliary modalities have achieved significant progress. However, these methods merely apply pixel-level augmentation to the original images and neglect dynamic modeling of modality-shared information, limiting their ability to reconcile modality discrepancies and capture cross-modal semantic correspondences. To address this, this paper proposes a Modality-perceptive Harmonization Network (MHN) to achieve feature-level harmonization through leveraging the coherences between visible and infrared modalities. Specifically, to alleviate domain discrepancies, a Modality-Perceptive Aggregation Module (MAM) is proposed to explicitly capture cross-modality consistency between heterogeneous modalities , thereby facilitating the adaptive fusion process of a harmonious hybrid modality and the extraction of reliable modality-shared features. Moreover, the modality harmonization loss is proposed to adjust the distribution of the generated hybrid modality and align the feature distributions across modalities. To address the issue of semantic inconsistency, a Dimensional Refinement Module (DRM) is proposed to decouple semantic information along channel and spatial dimensions to further enhance intra-modality diversity. Simultaneously, the modality consistency loss is designed to strengthen identity-related coherence of heterogeneous modalities, further enhancing the inter-modality semantic consistency. Extensive experiments on the SYSU-MM01, RegDB and LLCM datasets demonstrate the effectiveness of our model and a series ablation studies further validate the significant contributions of each component of our method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102979"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RK-VQA: Rational knowledge-aware fusion-in-decoder for knowledge-based visual question answering
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102969
Weipeng Chen , Xu Huang , Zifeng Liu , Jin Liu , Lan Yo
Knowledge-based Visual Question Answering (KB-VQA) expands traditional VQA by utilizing world knowledge from external sources when the image alone is insufficient to infer a correct answer. Existing methods face challenges due to low recall rates, limiting the ability to gather essential information for accurate answers. While increasing the amount of retrieved knowledge entries can enhance recall, it often introduces irrelevant information, adversely impairing model performance. To overcome these challenges, we propose RK-VQA, which comprises two components: First, a zero-shot weighted hybrid knowledge retrieval method that integrates local and global visual features with textual features from image–question pairs, enhancing the quality of knowledge retrieval and improving recall rates. Second, a rational knowledge-aware Fusion-in-Decoder architecture enhances answer generation by focusing on rational knowledge and reducing the influence of irrelevant information. Specifically, we develop a rational module to extract rational features, subsequently utilized to prioritize pertinent information via a novel rational knowledge-aware attention mechanism. We evaluate our RK-VQA on the OK-VQA, which is the largest knowledge-based VQA dataset. The results demonstrate that RK-VQA achieves significant results, recording an accuracy of 64.11%, surpassing the previous best result by 2.03%.
{"title":"RK-VQA: Rational knowledge-aware fusion-in-decoder for knowledge-based visual question answering","authors":"Weipeng Chen ,&nbsp;Xu Huang ,&nbsp;Zifeng Liu ,&nbsp;Jin Liu ,&nbsp;Lan Yo","doi":"10.1016/j.inffus.2025.102969","DOIUrl":"10.1016/j.inffus.2025.102969","url":null,"abstract":"<div><div>Knowledge-based Visual Question Answering (KB-VQA) expands traditional VQA by utilizing world knowledge from external sources when the image alone is insufficient to infer a correct answer. Existing methods face challenges due to low recall rates, limiting the ability to gather essential information for accurate answers. While increasing the amount of retrieved knowledge entries can enhance recall, it often introduces irrelevant information, adversely impairing model performance. To overcome these challenges, we propose RK-VQA, which comprises two components: First, a zero-shot weighted hybrid knowledge retrieval method that integrates local and global visual features with textual features from image–question pairs, enhancing the quality of knowledge retrieval and improving recall rates. Second, a rational knowledge-aware Fusion-in-Decoder architecture enhances answer generation by focusing on rational knowledge and reducing the influence of irrelevant information. Specifically, we develop a rational module to extract rational features, subsequently utilized to prioritize pertinent information via a novel rational knowledge-aware attention mechanism. We evaluate our RK-VQA on the OK-VQA, which is the largest knowledge-based VQA dataset. The results demonstrate that RK-VQA achieves significant results, recording an accuracy of 64.11%, surpassing the previous best result by 2.03%.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102969"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Next-generation coupled structure-human sensing technology: Enhanced pedestrian-bridge interaction analysis using data fusion and machine learning
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102983
Sahar Hassani , Samir Mustapha , Jianchun Li , Mohsen Mousavi , Ulrike Dackermann
The consequences of crowd behavior in high-density pedestrian flows, especially in response to exacerbating incidents, can result in tragic outcomes such as trampling and crushing, making the active monitoring of crowd motion crucial, to provide timely danger warnings and implement preventive measures. This paper proposes a novel approach for crowd behavior monitoring and prediction of bridge loads based on the following innovative solutions: (a) advanced optimized signal processing is leveraged for noise reduction; (b) novel data fusion approaches are proposed to extract the most informative measurement features; (c) fine-tuned machine-learning techniques are implemented for classification and regression tasks. Data from structure-based sensors and wearable devices were utilized to capture movement- and load-sensitive data on a pedestrian bridge, which facilitated the determination of crowd flow, density, and bridge loading information. The proposed monitoring approach explores signal preprocessing methodologies, including variational mode decomposition (VMD), downsampling, principal component analysis, and novel data fusion, to effectively minimize noise and errors in the input data. Data fusion strategies were introduced to significantly enhance the learning models and improve the overall efficiency and resilience of the system. For further analysis, a 2D-convolutional neural network (CNN) approach was initially applied independently to the sensing sources and subsequently extended to fuse multimodal raw, decomposed, and denoised data. The proposed monitoring method was validated using experimental data obtained from crowd simulations conducted on a scaled-down bridge panel, utilizing next-generation coupled structure-human sensing, fiber-optic sensing, and smartphone technology. The results demonstrated a high level of accuracy for crowd monitoring predictions, with the peak testing accuracy reaching 99.62% for single-class crowd flow classification, 98.69% for multiclass crowd flow and density classification, and 98.42% in R2 score for load estimation when fusing denoised signals using VMD. The proposed 2D-CNN model was compared with an existing adaptive Kalman filter (AKF) fusion technique and various machine learning techniques, including random forest, k-nearest neighbor, support vector machine, XGBoost, and ensemble methods. This comparison unequivocally confirmed the robustness and superiority of the proposed monitoring approach.
{"title":"Next-generation coupled structure-human sensing technology: Enhanced pedestrian-bridge interaction analysis using data fusion and machine learning","authors":"Sahar Hassani ,&nbsp;Samir Mustapha ,&nbsp;Jianchun Li ,&nbsp;Mohsen Mousavi ,&nbsp;Ulrike Dackermann","doi":"10.1016/j.inffus.2025.102983","DOIUrl":"10.1016/j.inffus.2025.102983","url":null,"abstract":"<div><div>The consequences of crowd behavior in high-density pedestrian flows, especially in response to exacerbating incidents, can result in tragic outcomes such as trampling and crushing, making the active monitoring of crowd motion crucial, to provide timely danger warnings and implement preventive measures. This paper proposes a novel approach for crowd behavior monitoring and prediction of bridge loads based on the following innovative solutions: (a) advanced optimized signal processing is leveraged for noise reduction; (b) novel data fusion approaches are proposed to extract the most informative measurement features; (c) fine-tuned machine-learning techniques are implemented for classification and regression tasks. Data from structure-based sensors and wearable devices were utilized to capture movement- and load-sensitive data on a pedestrian bridge, which facilitated the determination of crowd flow, density, and bridge loading information. The proposed monitoring approach explores signal preprocessing methodologies, including variational mode decomposition (VMD), downsampling, principal component analysis, and novel data fusion, to effectively minimize noise and errors in the input data. Data fusion strategies were introduced to significantly enhance the learning models and improve the overall efficiency and resilience of the system. For further analysis, a 2D-convolutional neural network (CNN) approach was initially applied independently to the sensing sources and subsequently extended to fuse multimodal raw, decomposed, and denoised data. The proposed monitoring method was validated using experimental data obtained from crowd simulations conducted on a scaled-down bridge panel, utilizing next-generation coupled structure-human sensing, fiber-optic sensing, and smartphone technology. The results demonstrated a high level of accuracy for crowd monitoring predictions, with the peak testing accuracy reaching 99.62% for single-class crowd flow classification, 98.69% for multiclass crowd flow and density classification, and 98.42% in R<sup>2</sup> score for load estimation when fusing denoised signals using VMD. The proposed 2D-CNN model was compared with an existing adaptive Kalman filter (AKF) fusion technique and various machine learning techniques, including random forest, k-nearest neighbor, support vector machine, XGBoost, and ensemble methods. This comparison unequivocally confirmed the robustness and superiority of the proposed monitoring approach.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102983"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-04 DOI: 10.1016/j.inffus.2025.102985
Bo Zhang , Hui Ma , Jian Ding , Jian Wang , Bo Xu , Hongfei Lin
Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image–text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code is available at https://github.com/zhangbo-nlp/VIKDF.
{"title":"Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation","authors":"Bo Zhang ,&nbsp;Hui Ma ,&nbsp;Jian Ding ,&nbsp;Jian Wang ,&nbsp;Bo Xu ,&nbsp;Hongfei Lin","doi":"10.1016/j.inffus.2025.102985","DOIUrl":"10.1016/j.inffus.2025.102985","url":null,"abstract":"<div><div>Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image–text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code is available at <span><span>https://github.com/zhangbo-nlp/VIKDF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102985"},"PeriodicalIF":14.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A homogeneous multimodality sentence representation for relation extraction
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-03 DOI: 10.1016/j.inffus.2025.102968
Kai Wang, Yanping Chen, WeiZhe Yang, Yongbin Qin
Deep neural networks enable a sentence to be transformed into different multimodalities such as a token sequence representation (a one-dimensional semantic representation) or a semantic plane (a two-dimensional semantic representation). Sequence representation has the advantage of learning sequential dependencies of a sentence. Semantic plane is built by organizing all spans of a sentence, which is effective in resolving complicated sentence semantic structures. The two representations are derived from a homogeneous resource (the same sentence), but they are separately used in related works. In this paper, a homogeneous multimodality sentence representation is proposed to make full use of semantic information in a sentence. We construct a homomodality model, which is composed of three components: a sequential encoder to generate sequential modality, a plane encoder to build plane modality, and a multimodality fusion component aligning homogeneous multimodalities for learning a multi-granularity semantic representation. Our model is evaluated on four public datasets to support the relation extraction task. Compared with related works, it achieves state-of-the-art performance on all datasets. Analytical experiments show that fusing homogeneous multimodalities is effective in making full use of sentence information for advancing the discriminability of a deep neural network.
{"title":"A homogeneous multimodality sentence representation for relation extraction","authors":"Kai Wang,&nbsp;Yanping Chen,&nbsp;WeiZhe Yang,&nbsp;Yongbin Qin","doi":"10.1016/j.inffus.2025.102968","DOIUrl":"10.1016/j.inffus.2025.102968","url":null,"abstract":"<div><div>Deep neural networks enable a sentence to be transformed into different multimodalities such as a token sequence representation (a one-dimensional semantic representation) or a semantic plane (a two-dimensional semantic representation). Sequence representation has the advantage of learning sequential dependencies of a sentence. Semantic plane is built by organizing all spans of a sentence, which is effective in resolving complicated sentence semantic structures. The two representations are derived from a homogeneous resource (the same sentence), but they are separately used in related works. In this paper, a homogeneous multimodality sentence representation is proposed to make full use of semantic information in a sentence. We construct a homomodality model, which is composed of three components: a sequential encoder to generate sequential modality, a plane encoder to build plane modality, and a multimodality fusion component aligning homogeneous multimodalities for learning a multi-granularity semantic representation. Our model is evaluated on four public datasets to support the relation extraction task. Compared with related works, it achieves state-of-the-art performance on all datasets. Analytical experiments show that fusing homogeneous multimodalities is effective in making full use of sentence information for advancing the discriminability of a deep neural network.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102968"},"PeriodicalIF":14.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143192860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSLP: A novel pansharpening method based on compressed sensing and L-PNN
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-03 DOI: 10.1016/j.inffus.2025.103002
Yingxia Chen , Zhenglong Wan , Zeqiang Chen , Mingming Wei
To address spectral distortion and the loss of spatial detail information caused by full-resolution pansharpening, this study proposes an unsupervised method combining compressed sensing (CS) and a deeper attention-based network architecture (L-PNN), namely CSLP. First, in the compressed sensing module, we apply sparse theory for image compression and reconstruction to reduce detail loss and enhance spatial and spectral resolution. Specifically, we use convolutional neural networks to implement this process. It precisely extracts the inherent features of the image to optimize the spectral distortion issues in the pansharpened image and accelerate the model's convergence. Next, we employ the L-PNN module to further optimize and emphasize image features, thereby improving generalizability and stability of the model. The combined processing of these two modules significantly enhances the fidelity of both spatial and spectral resolution in pansharpening. To prove the effectiveness of the proposed method, 19 different methods are compared on four datasets. The results reveal that the proposed method achieves outstanding performance in terms of both subjective evaluation and objective assessment metrics. The code is available at https://github.com/ahsore/CSLP.
{"title":"CSLP: A novel pansharpening method based on compressed sensing and L-PNN","authors":"Yingxia Chen ,&nbsp;Zhenglong Wan ,&nbsp;Zeqiang Chen ,&nbsp;Mingming Wei","doi":"10.1016/j.inffus.2025.103002","DOIUrl":"10.1016/j.inffus.2025.103002","url":null,"abstract":"<div><div>To address spectral distortion and the loss of spatial detail information caused by full-resolution pansharpening, this study proposes an unsupervised method combining compressed sensing (CS) and a deeper attention-based network architecture (L-PNN), namely CSLP. First, in the compressed sensing module, we apply sparse theory for image compression and reconstruction to reduce detail loss and enhance spatial and spectral resolution. Specifically, we use convolutional neural networks to implement this process. It precisely extracts the inherent features of the image to optimize the spectral distortion issues in the pansharpened image and accelerate the model's convergence. Next, we employ the L-PNN module to further optimize and emphasize image features, thereby improving generalizability and stability of the model. The combined processing of these two modules significantly enhances the fidelity of both spatial and spectral resolution in pansharpening. To prove the effectiveness of the proposed method, 19 different methods are compared on four datasets. The results reveal that the proposed method achieves outstanding performance in terms of both subjective evaluation and objective assessment metrics. The code is available at <span><span>https://github.com/ahsore/CSLP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 103002"},"PeriodicalIF":14.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wi-Fi fine time measurement–Principles, applications, and future trends: A survey
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-03 DOI: 10.1016/j.inffus.2025.102992
Yuan Wu , Mengyi He , Wei Li , Izzy Yi Jian , Yue Yu , Liang Chen , Ruizhi Chen
IEEE 802.11–2016 proposed the Wi-Fi Fine Time Measurement (FTM) protocol, aiming at providing meter or sub-meter level ranging function between smart terminals and Wi-Fi access points (APs). Compared with other indoor positioning technologies for instance, Bluetooth, acoustic, visible light, Ultra-wideband, etc., Wi-Fi has been characterized by low cost, no deployment, and potentially high positioning precision, especially with the enhancement of FTM, which enables Wi-Fi to be a competitive technology for Internet of Things, indoor location-based services (iLBSs), smart city, and many other fields. In this article, we first present a comprehensive survey that focuses on the Wi-Fi FTM technology, which contains the working principle, measurement for positioning, and methods comparison. We highlight the current FTM-related localization methods especially learning and multi-source fusion-based approaches. Then, we review the real-world applications and existing commercial solutions, revealing the possible directions for the industrialization of Wi-Fi FTM localization. Finally, this paper analyzes existing open issues of Wi-Fi FTM positioning (e.g., capability, scalability, multipath, NLOS, device heterogeneity, and privacy) and discusses the potential development trends.
{"title":"Wi-Fi fine time measurement–Principles, applications, and future trends: A survey","authors":"Yuan Wu ,&nbsp;Mengyi He ,&nbsp;Wei Li ,&nbsp;Izzy Yi Jian ,&nbsp;Yue Yu ,&nbsp;Liang Chen ,&nbsp;Ruizhi Chen","doi":"10.1016/j.inffus.2025.102992","DOIUrl":"10.1016/j.inffus.2025.102992","url":null,"abstract":"<div><div>IEEE 802.11–2016 proposed the Wi-Fi Fine Time Measurement (FTM) protocol, aiming at providing meter or sub-meter level ranging function between smart terminals and Wi-Fi access points (APs). Compared with other indoor positioning technologies for instance, Bluetooth, acoustic, visible light, Ultra-wideband, etc., Wi-Fi has been characterized by low cost, no deployment, and potentially high positioning precision, especially with the enhancement of FTM, which enables Wi-Fi to be a competitive technology for Internet of Things, indoor location-based services (iLBSs), smart city, and many other fields. In this article, we first present a comprehensive survey that focuses on the Wi-Fi FTM technology, which contains the working principle, measurement for positioning, and methods comparison. We highlight the current FTM-related localization methods especially learning and multi-source fusion-based approaches. Then, we review the real-world applications and existing commercial solutions, revealing the possible directions for the industrialization of Wi-Fi FTM localization. Finally, this paper analyzes existing open issues of Wi-Fi FTM positioning (e.g., capability, scalability, multipath, NLOS, device heterogeneity, and privacy) and discusses the potential development trends.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102992"},"PeriodicalIF":14.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143394251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A maximum satisfaction-based feedback mechanism for non-cooperative behavior management with agreeableness personality traits detection in group decision making
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 DOI: 10.1016/j.inffus.2025.102959
Yujia Liu , Yuwei Song , Jian Wu , Changyong Liang , Francisco Javier Cabrerizo
Non-cooperative behaviors will lead to consensus failure in group decision making problems. As a result, managing non-cooperative behavior is a significant challenge in group consensus reaching processes, which involves two main research questions:(1) How to define non-cooperative behavior? (2) How to design an appropriate model to manage non-cooperative behavior? Existing studies often overlook the psychological motivations behind non-cooperative behavior and achieve group consensus potentially at the expense of decision-makers’ satisfaction. To address these issues, this study proposes a novel maximum satisfaction-based feedback mechanism for managing non-cooperative behavior with personality traits prediction. To address the first research question, a novel approach for identifying non-cooperative behavior is proposed by comparing the solution of the Minimum Adjustment Consensus Model (MACM) to the maximum acceptable adjustment. The latter is defined by the decision maker’s Agreeableness trait within the Big Five personality traits framework, which is predicted by a CNN-BiLSTM model using the decision maker’s online reviews. For addressing the second research question, a novel two-phases feedback mechanism is introduced to manage non-cooperative behaviors based on the satisfaction principle in decision-making. The first phase involves implementing adjustment rule for non-cooperative decision-makers. The second phase involves applying adjustment rule for cooperative decision-makers. Finally, this study presents a case study focusing on the selection of a new energy vehicle enterprise supplier to illustrate the effectiveness of the proposed model in real-world applications. Furthermore, sensitivity analysis and comparative assessments are conducted to demonstrate advantages over traditional methods. Results indicate that the proposed method enhances both satisfaction and consensus levels compared to conventional non-cooperative consensus-reaching mechanisms.
{"title":"A maximum satisfaction-based feedback mechanism for non-cooperative behavior management with agreeableness personality traits detection in group decision making","authors":"Yujia Liu ,&nbsp;Yuwei Song ,&nbsp;Jian Wu ,&nbsp;Changyong Liang ,&nbsp;Francisco Javier Cabrerizo","doi":"10.1016/j.inffus.2025.102959","DOIUrl":"10.1016/j.inffus.2025.102959","url":null,"abstract":"<div><div>Non-cooperative behaviors will lead to consensus failure in group decision making problems. As a result, managing non-cooperative behavior is a significant challenge in group consensus reaching processes, which involves two main research questions:(1) How to define non-cooperative behavior? (2) How to design an appropriate model to manage non-cooperative behavior? Existing studies often overlook the psychological motivations behind non-cooperative behavior and achieve group consensus potentially at the expense of decision-makers’ satisfaction. To address these issues, this study proposes a novel maximum satisfaction-based feedback mechanism for managing non-cooperative behavior with personality traits prediction. To address the first research question, a novel approach for identifying non-cooperative behavior is proposed by comparing the solution of the Minimum Adjustment Consensus Model (MACM) to the maximum acceptable adjustment. The latter is defined by the decision maker’s Agreeableness trait within the Big Five personality traits framework, which is predicted by a CNN-BiLSTM model using the decision maker’s online reviews. For addressing the second research question, a novel two-phases feedback mechanism is introduced to manage non-cooperative behaviors based on the satisfaction principle in decision-making. The first phase involves implementing adjustment rule for non-cooperative decision-makers. The second phase involves applying adjustment rule for cooperative decision-makers. Finally, this study presents a case study focusing on the selection of a new energy vehicle enterprise supplier to illustrate the effectiveness of the proposed model in real-world applications. Furthermore, sensitivity analysis and comparative assessments are conducted to demonstrate advantages over traditional methods. Results indicate that the proposed method enhances both satisfaction and consensus levels compared to conventional non-cooperative consensus-reaching mechanisms.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102959"},"PeriodicalIF":14.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143077737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EHR-based prediction modelling meets multimodal deep learning: A systematic review of structured and textual data fusion methods
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 DOI: 10.1016/j.inffus.2025.102981
Ariel Soares Teles , Ivan Rodrigues de Moura , Francisco Silva , Angus Roberts , Daniel Stahl
Electronic Health Records (EHRs) have transformed healthcare by digitally consolidating patient medical history, encompassing structured data (e.g., demographic data, lab results), and unstructured textual data (e.g., clinical notes). These data hold significant potential for predictive modelling, and recent studies have dedicated efforts to leverage the different modalities in a cohesive and effective manner to improve predictive accuracy. This Systematic Literature Review (SLR) addresses the application of Multimodal Deep Learning (MDL) methods in EHR-based prediction modelling, specifically through the fusion of structured and textual data. Following PRISMA guidelines, we conducted a comprehensive literature search across six article databases, using a carefully designed search string. After applying inclusion and exclusion criteria, we selected 77 primary studies. Data extraction was standardized using a structured form based on the CHARMS checklist. We categorized and analysed the fusion strategies employed across the studies. By combining structured and textual data at the input level, early fusion enabled models to learn joint feature representations from the beginning, whether in vectorized representations or data textualization. Intermediate fusion, which delays integration, was particularly useful for tasks where each modality provides unique insights that need to be processed independently before being combined. Late fusion enabled modularity by integrating outputs from unimodal models, which is suitable when EHR structured and textual data have varying quality or reliability. We also identified trends and open issues that need attention. This review contributes a comprehensive understanding of EHR data fusion practices using MDL, highlighting potential pathways for future research and development in health informatics.
{"title":"EHR-based prediction modelling meets multimodal deep learning: A systematic review of structured and textual data fusion methods","authors":"Ariel Soares Teles ,&nbsp;Ivan Rodrigues de Moura ,&nbsp;Francisco Silva ,&nbsp;Angus Roberts ,&nbsp;Daniel Stahl","doi":"10.1016/j.inffus.2025.102981","DOIUrl":"10.1016/j.inffus.2025.102981","url":null,"abstract":"<div><div>Electronic Health Records (EHRs) have transformed healthcare by digitally consolidating patient medical history, encompassing structured data (e.g., demographic data, lab results), and unstructured textual data (e.g., clinical notes). These data hold significant potential for predictive modelling, and recent studies have dedicated efforts to leverage the different modalities in a cohesive and effective manner to improve predictive accuracy. This Systematic Literature Review (SLR) addresses the application of Multimodal Deep Learning (MDL) methods in EHR-based prediction modelling, specifically through the fusion of structured and textual data. Following PRISMA guidelines, we conducted a comprehensive literature search across six article databases, using a carefully designed search string. After applying inclusion and exclusion criteria, we selected 77 primary studies. Data extraction was standardized using a structured form based on the CHARMS checklist. We categorized and analysed the fusion strategies employed across the studies. By combining structured and textual data at the input level, early fusion enabled models to learn joint feature representations from the beginning, whether in vectorized representations or data textualization. Intermediate fusion, which delays integration, was particularly useful for tasks where each modality provides unique insights that need to be processed independently before being combined. Late fusion enabled modularity by integrating outputs from unimodal models, which is suitable when EHR structured and textual data have varying quality or reliability. We also identified trends and open issues that need attention. This review contributes a comprehensive understanding of EHR data fusion practices using MDL, highlighting potential pathways for future research and development in health informatics.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102981"},"PeriodicalIF":14.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1