Information Fusion最新文献_第5页

Vision-Language Models in medical image analysis: From simple fusion to general large models

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-06 DOI: 10.1016/j.inffus.2025.102995

Xiang Li , Like Li , Yuchen Jiang , Hao Wang , Xinyu Qiao , Ting Feng , Hao Luo , Yong Zhao

Vision-Language Model (VLM) is a kind of multi-modality deep learning model that aims to fuse visual information with language information to enhance the understanding and analysis of visual content. VLM was originally used to integrate multi-modality information and improve task accuracy. Then, VLM was further developed in combination with zero-shot and few-shot learning to solve the problem of insufficient medical labels. At present, it is the technical basis of the popular medical general large model. Its role is no longer limited to simple information fusion. This paper makes a comprehensive review for the development and application of VLM-based medical image analysis technology. Specifically, this paper first introduces the basic principle and explains the pre-training and fine-tuning framework. Then, the research progress of medical image classification, segmentation, report generation, question answering, image generation, large model and other application scenarios is introduced. This paper also summarizes seven main characteristics of medical image VLM, and analyzes the specific embodiment of these characteristics in each task. Finally, the challenges, potential solutions and future directions in this field are discussed. VLM is still in a rapid development in the field of medical image analysis, and a continuously updated repository of papers and code has been built, it is available at https://github.com/XiangQA-Q/VLM-in-MIA.

{"title":"Vision-Language Models in medical image analysis: From simple fusion to general large models","authors":"Xiang Li , Like Li , Yuchen Jiang , Hao Wang , Xinyu Qiao , Ting Feng , Hao Luo , Yong Zhao","doi":"10.1016/j.inffus.2025.102995","DOIUrl":"10.1016/j.inffus.2025.102995","url":null,"abstract":"<div><div>Vision-Language Model (VLM) is a kind of multi-modality deep learning model that aims to fuse visual information with language information to enhance the understanding and analysis of visual content. VLM was originally used to integrate multi-modality information and improve task accuracy. Then, VLM was further developed in combination with zero-shot and few-shot learning to solve the problem of insufficient medical labels. At present, it is the technical basis of the popular medical general large model. Its role is no longer limited to simple information fusion. This paper makes a comprehensive review for the development and application of VLM-based medical image analysis technology. Specifically, this paper first introduces the basic principle and explains the pre-training and fine-tuning framework. Then, the research progress of medical image classification, segmentation, report generation, question answering, image generation, large model and other application scenarios is introduced. This paper also summarizes seven main characteristics of medical image VLM, and analyzes the specific embodiment of these characteristics in each task. Finally, the challenges, potential solutions and future directions in this field are discussed. VLM is still in a rapid development in the field of medical image analysis, and a continuously updated repository of papers and code has been built, it is available at <span><span>https://github.com/XiangQA-Q/VLM-in-MIA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102995"},"PeriodicalIF":14.7,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advances in DeepFake detection algorithms: Exploring fusion techniques in single and multi-modal approach

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102993

Ashish Kumar , Divya Singh , Rachna Jain , Deepak Kumar Jain , Chenquan Gan , Xudong Zhao

In recent years, generative artificial intelligence has gained momentum and created extremely realistic synthetic multimedia content that can spread misinformation and mislead society. Deepfake detection is a technique consisting of frameworks, algorithms and approaches to predict manipulated contents namely, image, audio and video. To this end, we have analyzed and explored various deepfake detection frameworks by categorizing them as single-modal or multi-modal approaches. To provide better understanding and clarity, single-modal approaches are further categorized as conventional and advanced techniques. Conventional techniques extract complementary handcrafted features and classify them using machine-learning-based algorithms. On the other hand, advanced techniques adopt deep learning and hybrid algorithms to detect deepfakes. Multi-modal techniques utilize a mixture of two or more modalities for feature extraction and fuse them to obtain the final classification scores. These techniques are also categorized either as deep learning or hybrid techniques. The complementary features, multiple modalities, and deep learning models are fused adaptively using score-level or feature-level fusion. The advantages, features, practical applications, and limitations under each category are highlighted to address the challenges and determine future trends to counter deepfakes. In addition, recommendations are also elaborated to evaluate the potential of artificial intelligence in deepfake detection for providing a safer and more reliable digital world.

{"title":"Advances in DeepFake detection algorithms: Exploring fusion techniques in single and multi-modal approach","authors":"Ashish Kumar , Divya Singh , Rachna Jain , Deepak Kumar Jain , Chenquan Gan , Xudong Zhao","doi":"10.1016/j.inffus.2025.102993","DOIUrl":"10.1016/j.inffus.2025.102993","url":null,"abstract":"<div><div>In recent years, generative artificial intelligence has gained momentum and created extremely realistic synthetic multimedia content that can spread misinformation and mislead society. Deepfake detection is a technique consisting of frameworks, algorithms and approaches to predict manipulated contents namely, image, audio and video. To this end, we have analyzed and explored various deepfake detection frameworks by categorizing them as single-modal or multi-modal approaches. To provide better understanding and clarity, single-modal approaches are further categorized as conventional and advanced techniques. Conventional techniques extract complementary handcrafted features and classify them using machine-learning-based algorithms. On the other hand, advanced techniques adopt deep learning and hybrid algorithms to detect deepfakes. Multi-modal techniques utilize a mixture of two or more modalities for feature extraction and fuse them to obtain the final classification scores. These techniques are also categorized either as deep learning or hybrid techniques. The complementary features, multiple modalities, and deep learning models are fused adaptively using score-level or feature-level fusion. The advantages, features, practical applications, and limitations under each category are highlighted to address the challenges and determine future trends to counter deepfakes. In addition, recommendations are also elaborated to evaluate the potential of artificial intelligence in deepfake detection for providing a safer and more reliable digital world.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102993"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modality-perceptive harmonization network for visible-infrared person re-identification

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102979

Xutao Zuo , Jinjia Peng , Tianhang Cheng , Huibing Wang

Visible-infrared person re-identification (VI-ReID) remains a challenging task due to the inconsistencies in data distribution and semantic inconsistency between heterogeneous modalities. Some visible-infrared person re-identification methods that leverage auxiliary modalities have achieved significant progress. However, these methods merely apply pixel-level augmentation to the original images and neglect dynamic modeling of modality-shared information, limiting their ability to reconcile modality discrepancies and capture cross-modal semantic correspondences. To address this, this paper proposes a Modality-perceptive Harmonization Network (MHN) to achieve feature-level harmonization through leveraging the coherences between visible and infrared modalities. Specifically, to alleviate domain discrepancies, a Modality-Perceptive Aggregation Module (MAM) is proposed to explicitly capture cross-modality consistency between heterogeneous modalities , thereby facilitating the adaptive fusion process of a harmonious hybrid modality and the extraction of reliable modality-shared features. Moreover, the modality harmonization loss is proposed to adjust the distribution of the generated hybrid modality and align the feature distributions across modalities. To address the issue of semantic inconsistency, a Dimensional Refinement Module (DRM) is proposed to decouple semantic information along channel and spatial dimensions to further enhance intra-modality diversity. Simultaneously, the modality consistency loss is designed to strengthen identity-related coherence of heterogeneous modalities, further enhancing the inter-modality semantic consistency. Extensive experiments on the SYSU-MM01, RegDB and LLCM datasets demonstrate the effectiveness of our model and a series ablation studies further validate the significant contributions of each component of our method.

{"title":"Modality-perceptive harmonization network for visible-infrared person re-identification","authors":"Xutao Zuo , Jinjia Peng , Tianhang Cheng , Huibing Wang","doi":"10.1016/j.inffus.2025.102979","DOIUrl":"10.1016/j.inffus.2025.102979","url":null,"abstract":"<div><div>Visible-infrared person re-identification (VI-ReID) remains a challenging task due to the inconsistencies in data distribution and semantic inconsistency between heterogeneous modalities. Some visible-infrared person re-identification methods that leverage auxiliary modalities have achieved significant progress. However, these methods merely apply pixel-level augmentation to the original images and neglect dynamic modeling of modality-shared information, limiting their ability to reconcile modality discrepancies and capture cross-modal semantic correspondences. To address this, this paper proposes a Modality-perceptive Harmonization Network (MHN) to achieve feature-level harmonization through leveraging the coherences between visible and infrared modalities. Specifically, to alleviate domain discrepancies, a Modality-Perceptive Aggregation Module (MAM) is proposed to explicitly capture cross-modality consistency between heterogeneous modalities , thereby facilitating the adaptive fusion process of a harmonious hybrid modality and the extraction of reliable modality-shared features. Moreover, the modality harmonization loss is proposed to adjust the distribution of the generated hybrid modality and align the feature distributions across modalities. To address the issue of semantic inconsistency, a Dimensional Refinement Module (DRM) is proposed to decouple semantic information along channel and spatial dimensions to further enhance intra-modality diversity. Simultaneously, the modality consistency loss is designed to strengthen identity-related coherence of heterogeneous modalities, further enhancing the inter-modality semantic consistency. Extensive experiments on the SYSU-MM01, RegDB and LLCM datasets demonstrate the effectiveness of our model and a series ablation studies further validate the significant contributions of each component of our method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102979"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RK-VQA: Rational knowledge-aware fusion-in-decoder for knowledge-based visual question answering

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102969

Weipeng Chen , Xu Huang , Zifeng Liu , Jin Liu , Lan Yo

Knowledge-based Visual Question Answering (KB-VQA) expands traditional VQA by utilizing world knowledge from external sources when the image alone is insufficient to infer a correct answer. Existing methods face challenges due to low recall rates, limiting the ability to gather essential information for accurate answers. While increasing the amount of retrieved knowledge entries can enhance recall, it often introduces irrelevant information, adversely impairing model performance. To overcome these challenges, we propose RK-VQA, which comprises two components: First, a zero-shot weighted hybrid knowledge retrieval method that integrates local and global visual features with textual features from image–question pairs, enhancing the quality of knowledge retrieval and improving recall rates. Second, a rational knowledge-aware Fusion-in-Decoder architecture enhances answer generation by focusing on rational knowledge and reducing the influence of irrelevant information. Specifically, we develop a rational module to extract rational features, subsequently utilized to prioritize pertinent information via a novel rational knowledge-aware attention mechanism. We evaluate our RK-VQA on the OK-VQA, which is the largest knowledge-based VQA dataset. The results demonstrate that RK-VQA achieves significant results, recording an accuracy of 64.11%, surpassing the previous best result by 2.03%.

{"title":"RK-VQA: Rational knowledge-aware fusion-in-decoder for knowledge-based visual question answering","authors":"Weipeng Chen , Xu Huang , Zifeng Liu , Jin Liu , Lan Yo","doi":"10.1016/j.inffus.2025.102969","DOIUrl":"10.1016/j.inffus.2025.102969","url":null,"abstract":"<div><div>Knowledge-based Visual Question Answering (KB-VQA) expands traditional VQA by utilizing world knowledge from external sources when the image alone is insufficient to infer a correct answer. Existing methods face challenges due to low recall rates, limiting the ability to gather essential information for accurate answers. While increasing the amount of retrieved knowledge entries can enhance recall, it often introduces irrelevant information, adversely impairing model performance. To overcome these challenges, we propose RK-VQA, which comprises two components: First, a zero-shot weighted hybrid knowledge retrieval method that integrates local and global visual features with textual features from image–question pairs, enhancing the quality of knowledge retrieval and improving recall rates. Second, a rational knowledge-aware Fusion-in-Decoder architecture enhances answer generation by focusing on rational knowledge and reducing the influence of irrelevant information. Specifically, we develop a rational module to extract rational features, subsequently utilized to prioritize pertinent information via a novel rational knowledge-aware attention mechanism. We evaluate our RK-VQA on the OK-VQA, which is the largest knowledge-based VQA dataset. The results demonstrate that RK-VQA achieves significant results, recording an accuracy of 64.11%, surpassing the previous best result by 2.03%.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102969"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Next-generation coupled structure-human sensing technology: Enhanced pedestrian-bridge interaction analysis using data fusion and machine learning

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-05 DOI: 10.1016/j.inffus.2025.102983

Sahar Hassani , Samir Mustapha , Jianchun Li , Mohsen Mousavi , Ulrike Dackermann

The consequences of crowd behavior in high-density pedestrian flows, especially in response to exacerbating incidents, can result in tragic outcomes such as trampling and crushing, making the active monitoring of crowd motion crucial, to provide timely danger warnings and implement preventive measures. This paper proposes a novel approach for crowd behavior monitoring and prediction of bridge loads based on the following innovative solutions: (a) advanced optimized signal processing is leveraged for noise reduction; (b) novel data fusion approaches are proposed to extract the most informative measurement features; (c) fine-tuned machine-learning techniques are implemented for classification and regression tasks. Data from structure-based sensors and wearable devices were utilized to capture movement- and load-sensitive data on a pedestrian bridge, which facilitated the determination of crowd flow, density, and bridge loading information. The proposed monitoring approach explores signal preprocessing methodologies, including variational mode decomposition (VMD), downsampling, principal component analysis, and novel data fusion, to effectively minimize noise and errors in the input data. Data fusion strategies were introduced to significantly enhance the learning models and improve the overall efficiency and resilience of the system. For further analysis, a 2D-convolutional neural network (CNN) approach was initially applied independently to the sensing sources and subsequently extended to fuse multimodal raw, decomposed, and denoised data. The proposed monitoring method was validated using experimental data obtained from crowd simulations conducted on a scaled-down bridge panel, utilizing next-generation coupled structure-human sensing, fiber-optic sensing, and smartphone technology. The results demonstrated a high level of accuracy for crowd monitoring predictions, with the peak testing accuracy reaching 99.62% for single-class crowd flow classification, 98.69% for multiclass crowd flow and density classification, and 98.42% in R² score for load estimation when fusing denoised signals using VMD. The proposed 2D-CNN model was compared with an existing adaptive Kalman filter (AKF) fusion technique and various machine learning techniques, including random forest, k-nearest neighbor, support vector machine, XGBoost, and ensemble methods. This comparison unequivocally confirmed the robustness and superiority of the proposed monitoring approach.

{"title":"Next-generation coupled structure-human sensing technology: Enhanced pedestrian-bridge interaction analysis using data fusion and machine learning","authors":"Sahar Hassani , Samir Mustapha , Jianchun Li , Mohsen Mousavi , Ulrike Dackermann","doi":"10.1016/j.inffus.2025.102983","DOIUrl":"10.1016/j.inffus.2025.102983","url":null,"abstract":"<div><div>The consequences of crowd behavior in high-density pedestrian flows, especially in response to exacerbating incidents, can result in tragic outcomes such as trampling and crushing, making the active monitoring of crowd motion crucial, to provide timely danger warnings and implement preventive measures. This paper proposes a novel approach for crowd behavior monitoring and prediction of bridge loads based on the following innovative solutions: (a) advanced optimized signal processing is leveraged for noise reduction; (b) novel data fusion approaches are proposed to extract the most informative measurement features; (c) fine-tuned machine-learning techniques are implemented for classification and regression tasks. Data from structure-based sensors and wearable devices were utilized to capture movement- and load-sensitive data on a pedestrian bridge, which facilitated the determination of crowd flow, density, and bridge loading information. The proposed monitoring approach explores signal preprocessing methodologies, including variational mode decomposition (VMD), downsampling, principal component analysis, and novel data fusion, to effectively minimize noise and errors in the input data. Data fusion strategies were introduced to significantly enhance the learning models and improve the overall efficiency and resilience of the system. For further analysis, a 2D-convolutional neural network (CNN) approach was initially applied independently to the sensing sources and subsequently extended to fuse multimodal raw, decomposed, and denoised data. The proposed monitoring method was validated using experimental data obtained from crowd simulations conducted on a scaled-down bridge panel, utilizing next-generation coupled structure-human sensing, fiber-optic sensing, and smartphone technology. The results demonstrated a high level of accuracy for crowd monitoring predictions, with the peak testing accuracy reaching 99.62% for single-class crowd flow classification, 98.69% for multiclass crowd flow and density classification, and 98.42% in R<sup>2</sup> score for load estimation when fusing denoised signals using VMD. The proposed 2D-CNN model was compared with an existing adaptive Kalman filter (AKF) fusion technique and various machine learning techniques, including random forest, k-nearest neighbor, support vector machine, XGBoost, and ensemble methods. This comparison unequivocally confirmed the robustness and superiority of the proposed monitoring approach.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102983"},"PeriodicalIF":14.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-04 DOI: 10.1016/j.inffus.2025.102985

Bo Zhang , Hui Ma , Jian Ding , Jian Wang , Bo Xu , Hongfei Lin

Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image–text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code is available at https://github.com/zhangbo-nlp/VIKDF.

{"title":"Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation","authors":"Bo Zhang , Hui Ma , Jian Ding , Jian Wang , Bo Xu , Hongfei Lin","doi":"10.1016/j.inffus.2025.102985","DOIUrl":"10.1016/j.inffus.2025.102985","url":null,"abstract":"<div><div>Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image–text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code is available at <span><span>https://github.com/zhangbo-nlp/VIKDF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102985"},"PeriodicalIF":14.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A homogeneous multimodality sentence representation for relation extraction

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-03 DOI: 10.1016/j.inffus.2025.102968

Kai Wang, Yanping Chen, WeiZhe Yang, Yongbin Qin

Deep neural networks enable a sentence to be transformed into different multimodalities such as a token sequence representation (a one-dimensional semantic representation) or a semantic plane (a two-dimensional semantic representation). Sequence representation has the advantage of learning sequential dependencies of a sentence. Semantic plane is built by organizing all spans of a sentence, which is effective in resolving complicated sentence semantic structures. The two representations are derived from a homogeneous resource (the same sentence), but they are separately used in related works. In this paper, a homogeneous multimodality sentence representation is proposed to make full use of semantic information in a sentence. We construct a homomodality model, which is composed of three components: a sequential encoder to generate sequential modality, a plane encoder to build plane modality, and a multimodality fusion component aligning homogeneous multimodalities for learning a multi-granularity semantic representation. Our model is evaluated on four public datasets to support the relation extraction task. Compared with related works, it achieves state-of-the-art performance on all datasets. Analytical experiments show that fusing homogeneous multimodalities is effective in making full use of sentence information for advancing the discriminability of a deep neural network.

{"title":"A homogeneous multimodality sentence representation for relation extraction","authors":"Kai Wang, Yanping Chen, WeiZhe Yang, Yongbin Qin","doi":"10.1016/j.inffus.2025.102968","DOIUrl":"10.1016/j.inffus.2025.102968","url":null,"abstract":"<div><div>Deep neural networks enable a sentence to be transformed into different multimodalities such as a token sequence representation (a one-dimensional semantic representation) or a semantic plane (a two-dimensional semantic representation). Sequence representation has the advantage of learning sequential dependencies of a sentence. Semantic plane is built by organizing all spans of a sentence, which is effective in resolving complicated sentence semantic structures. The two representations are derived from a homogeneous resource (the same sentence), but they are separately used in related works. In this paper, a homogeneous multimodality sentence representation is proposed to make full use of semantic information in a sentence. We construct a homomodality model, which is composed of three components: a sequential encoder to generate sequential modality, a plane encoder to build plane modality, and a multimodality fusion component aligning homogeneous multimodalities for learning a multi-granularity semantic representation. Our model is evaluated on four public datasets to support the relation extraction task. Compared with related works, it achieves state-of-the-art performance on all datasets. Analytical experiments show that fusing homogeneous multimodalities is effective in making full use of sentence information for advancing the discriminability of a deep neural network.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102968"},"PeriodicalIF":14.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143192860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CSLP: A novel pansharpening method based on compressed sensing and L-PNN

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-03 DOI: 10.1016/j.inffus.2025.103002

Yingxia Chen , Zhenglong Wan , Zeqiang Chen , Mingming Wei

To address spectral distortion and the loss of spatial detail information caused by full-resolution pansharpening, this study proposes an unsupervised method combining compressed sensing (CS) and a deeper attention-based network architecture (L-PNN), namely CSLP. First, in the compressed sensing module, we apply sparse theory for image compression and reconstruction to reduce detail loss and enhance spatial and spectral resolution. Specifically, we use convolutional neural networks to implement this process. It precisely extracts the inherent features of the image to optimize the spectral distortion issues in the pansharpened image and accelerate the model's convergence. Next, we employ the L-PNN module to further optimize and emphasize image features, thereby improving generalizability and stability of the model. The combined processing of these two modules significantly enhances the fidelity of both spatial and spectral resolution in pansharpening. To prove the effectiveness of the proposed method, 19 different methods are compared on four datasets. The results reveal that the proposed method achieves outstanding performance in terms of both subjective evaluation and objective assessment metrics. The code is available at https://github.com/ahsore/CSLP.

{"title":"CSLP: A novel pansharpening method based on compressed sensing and L-PNN","authors":"Yingxia Chen , Zhenglong Wan , Zeqiang Chen , Mingming Wei","doi":"10.1016/j.inffus.2025.103002","DOIUrl":"10.1016/j.inffus.2025.103002","url":null,"abstract":"<div><div>To address spectral distortion and the loss of spatial detail information caused by full-resolution pansharpening, this study proposes an unsupervised method combining compressed sensing (CS) and a deeper attention-based network architecture (L-PNN), namely CSLP. First, in the compressed sensing module, we apply sparse theory for image compression and reconstruction to reduce detail loss and enhance spatial and spectral resolution. Specifically, we use convolutional neural networks to implement this process. It precisely extracts the inherent features of the image to optimize the spectral distortion issues in the pansharpened image and accelerate the model's convergence. Next, we employ the L-PNN module to further optimize and emphasize image features, thereby improving generalizability and stability of the model. The combined processing of these two modules significantly enhances the fidelity of both spatial and spectral resolution in pansharpening. To prove the effectiveness of the proposed method, 19 different methods are compared on four datasets. The results reveal that the proposed method achieves outstanding performance in terms of both subjective evaluation and objective assessment metrics. The code is available at <span><span>https://github.com/ahsore/CSLP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 103002"},"PeriodicalIF":14.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wi-Fi fine time measurement–Principles, applications, and future trends: A survey

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-03 DOI: 10.1016/j.inffus.2025.102992

Yuan Wu , Mengyi He , Wei Li , Izzy Yi Jian , Yue Yu , Liang Chen , Ruizhi Chen

IEEE 802.11–2016 proposed the Wi-Fi Fine Time Measurement (FTM) protocol, aiming at providing meter or sub-meter level ranging function between smart terminals and Wi-Fi access points (APs). Compared with other indoor positioning technologies for instance, Bluetooth, acoustic, visible light, Ultra-wideband, etc., Wi-Fi has been characterized by low cost, no deployment, and potentially high positioning precision, especially with the enhancement of FTM, which enables Wi-Fi to be a competitive technology for Internet of Things, indoor location-based services (iLBSs), smart city, and many other fields. In this article, we first present a comprehensive survey that focuses on the Wi-Fi FTM technology, which contains the working principle, measurement for positioning, and methods comparison. We highlight the current FTM-related localization methods especially learning and multi-source fusion-based approaches. Then, we review the real-world applications and existing commercial solutions, revealing the possible directions for the industrialization of Wi-Fi FTM localization. Finally, this paper analyzes existing open issues of Wi-Fi FTM positioning (e.g., capability, scalability, multipath, NLOS, device heterogeneity, and privacy) and discusses the potential development trends.

{"title":"Wi-Fi fine time measurement–Principles, applications, and future trends: A survey","authors":"Yuan Wu , Mengyi He , Wei Li , Izzy Yi Jian , Yue Yu , Liang Chen , Ruizhi Chen","doi":"10.1016/j.inffus.2025.102992","DOIUrl":"10.1016/j.inffus.2025.102992","url":null,"abstract":"<div><div>IEEE 802.11–2016 proposed the Wi-Fi Fine Time Measurement (FTM) protocol, aiming at providing meter or sub-meter level ranging function between smart terminals and Wi-Fi access points (APs). Compared with other indoor positioning technologies for instance, Bluetooth, acoustic, visible light, Ultra-wideband, etc., Wi-Fi has been characterized by low cost, no deployment, and potentially high positioning precision, especially with the enhancement of FTM, which enables Wi-Fi to be a competitive technology for Internet of Things, indoor location-based services (iLBSs), smart city, and many other fields. In this article, we first present a comprehensive survey that focuses on the Wi-Fi FTM technology, which contains the working principle, measurement for positioning, and methods comparison. We highlight the current FTM-related localization methods especially learning and multi-source fusion-based approaches. Then, we review the real-world applications and existing commercial solutions, revealing the possible directions for the industrialization of Wi-Fi FTM localization. Finally, this paper analyzes existing open issues of Wi-Fi FTM positioning (e.g., capability, scalability, multipath, NLOS, device heterogeneity, and privacy) and discusses the potential development trends.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102992"},"PeriodicalIF":14.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143394251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A maximum satisfaction-based feedback mechanism for non-cooperative behavior management with agreeableness personality traits detection in group decision making

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2025-02-01 DOI: 10.1016/j.inffus.2025.102959

Yujia Liu , Yuwei Song , Jian Wu , Changyong Liang , Francisco Javier Cabrerizo

Non-cooperative behaviors will lead to consensus failure in group decision making problems. As a result, managing non-cooperative behavior is a significant challenge in group consensus reaching processes, which involves two main research questions:(1) How to define non-cooperative behavior? (2) How to design an appropriate model to manage non-cooperative behavior? Existing studies often overlook the psychological motivations behind non-cooperative behavior and achieve group consensus potentially at the expense of decision-makers’ satisfaction. To address these issues, this study proposes a novel maximum satisfaction-based feedback mechanism for managing non-cooperative behavior with personality traits prediction. To address the first research question, a novel approach for identifying non-cooperative behavior is proposed by comparing the solution of the Minimum Adjustment Consensus Model (MACM) to the maximum acceptable adjustment. The latter is defined by the decision maker’s Agreeableness trait within the Big Five personality traits framework, which is predicted by a CNN-BiLSTM model using the decision maker’s online reviews. For addressing the second research question, a novel two-phases feedback mechanism is introduced to manage non-cooperative behaviors based on the satisfaction principle in decision-making. The first phase involves implementing adjustment rule for non-cooperative decision-makers. The second phase involves applying adjustment rule for cooperative decision-makers. Finally, this study presents a case study focusing on the selection of a new energy vehicle enterprise supplier to illustrate the effectiveness of the proposed model in real-world applications. Furthermore, sensitivity analysis and comparative assessments are conducted to demonstrate advantages over traditional methods. Results indicate that the proposed method enhances both satisfaction and consensus levels compared to conventional non-cooperative consensus-reaching mechanisms.

{"title":"A maximum satisfaction-based feedback mechanism for non-cooperative behavior management with agreeableness personality traits detection in group decision making","authors":"Yujia Liu , Yuwei Song , Jian Wu , Changyong Liang , Francisco Javier Cabrerizo","doi":"10.1016/j.inffus.2025.102959","DOIUrl":"10.1016/j.inffus.2025.102959","url":null,"abstract":"<div><div>Non-cooperative behaviors will lead to consensus failure in group decision making problems. As a result, managing non-cooperative behavior is a significant challenge in group consensus reaching processes, which involves two main research questions:(1) How to define non-cooperative behavior? (2) How to design an appropriate model to manage non-cooperative behavior? Existing studies often overlook the psychological motivations behind non-cooperative behavior and achieve group consensus potentially at the expense of decision-makers’ satisfaction. To address these issues, this study proposes a novel maximum satisfaction-based feedback mechanism for managing non-cooperative behavior with personality traits prediction. To address the first research question, a novel approach for identifying non-cooperative behavior is proposed by comparing the solution of the Minimum Adjustment Consensus Model (MACM) to the maximum acceptable adjustment. The latter is defined by the decision maker’s Agreeableness trait within the Big Five personality traits framework, which is predicted by a CNN-BiLSTM model using the decision maker’s online reviews. For addressing the second research question, a novel two-phases feedback mechanism is introduced to manage non-cooperative behaviors based on the satisfaction principle in decision-making. The first phase involves implementing adjustment rule for non-cooperative decision-makers. The second phase involves applying adjustment rule for cooperative decision-makers. Finally, this study presents a case study focusing on the selection of a new energy vehicle enterprise supplier to illustrate the effectiveness of the proposed model in real-world applications. Furthermore, sensitivity analysis and comparative assessments are conducted to demonstrate advantages over traditional methods. Results indicate that the proposed method enhances both satisfaction and consensus levels compared to conventional non-cooperative consensus-reaching mechanisms.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102959"},"PeriodicalIF":14.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143077737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0