Information Fusion最新文献_第10页

Deep-TCP: Multi-source data fusion for deep learning-powered tropical cyclone intensity prediction to enhance urban sustainability Deep-TCP：多源数据融合用于深度学习驱动的热带气旋强度预测，以增强城市可持续性

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-07 DOI: 10.1016/j.inffus.2024.102670

Shuailong Jiang , Maohan Liang , Chunzai Wang , Hanjie Fan , Yingying Ma

Tropical cyclones (TC) exert a profound impact on cities, causing extensive damage and losses. Thus, TC Intensity Prediction is crucial for creating sustainable cities as it enables proactive measures to be taken, including evacuation planning, infrastructure reinforcement, and emergency response coordination. In this study, we propose a Deep learning-powered TC Intensity Prediction (Deep-TCP) framework. In particular, Deep-TCP contains a data constraint module for fusing data features from multiple sources and establishing a unified global representation. To capture the spatiotemporal attributes, a Spatial-Temporal Attention (ST-Attention) module is built to distill insights from environmental variables. To improve the robustness and stability of the predictions, an encoder-decoder module that utilizes the ConvGPU unit is introduced to enhance feature maps. Then, a novel feature enhancement module is built to bolster the generalization capability and solve the dependency attenuation. The results demonstrate that the Deep-TCP framework significantly outperforms various benchmarks. Additionally, it effectively predicts multiple TC categories within the 6–24 h timeframe, showing strong capability in predicting changing trends. The reliable prediction results are potentially beneficial for disaster management and urban planning, significantly enhancing urban sustainability by improving preparedness and response strategies.

热带气旋（TC）对城市产生深远影响，造成巨大破坏和损失。因此，热带气旋强度预测对创建可持续发展城市至关重要，因为它能帮助采取积极措施，包括疏散规划、基础设施加固和应急响应协调。在本研究中，我们提出了一个由深度学习驱动的热气旋强度预测（Deep-TCP）框架。其中，Deep-TCP 包含一个数据约束模块，用于融合多个来源的数据特征并建立统一的全局表示。为了捕捉时空属性，建立了一个空间-时间注意力（ST-Attention）模块，以便从环境变量中提炼洞察力。为了提高预测的鲁棒性和稳定性，引入了一个利用 ConvGPU 单元的编码器-解码器模块来增强特征图。然后，建立了一个新颖的特征增强模块，以增强泛化能力并解决依赖衰减问题。结果表明，Deep-TCP 框架的性能明显优于各种基准测试。此外，它还能有效预测 6-24 h 时间范围内的多个热带气旋类别，显示出预测变化趋势的强大能力。可靠的预测结果可为灾害管理和城市规划带来潜在益处，通过改进防备和应对策略，显著提高城市的可持续发展能力。

{"title":"Deep-TCP: Multi-source data fusion for deep learning-powered tropical cyclone intensity prediction to enhance urban sustainability","authors":"Shuailong Jiang , Maohan Liang , Chunzai Wang , Hanjie Fan , Yingying Ma","doi":"10.1016/j.inffus.2024.102670","DOIUrl":"10.1016/j.inffus.2024.102670","url":null,"abstract":"<div><p>Tropical cyclones (TC) exert a profound impact on cities, causing extensive damage and losses. Thus, TC Intensity Prediction is crucial for creating sustainable cities as it enables proactive measures to be taken, including evacuation planning, infrastructure reinforcement, and emergency response coordination. In this study, we propose a Deep learning-powered TC Intensity Prediction (Deep-TCP) framework. In particular, Deep-TCP contains a data constraint module for fusing data features from multiple sources and establishing a unified global representation. To capture the spatiotemporal attributes, a Spatial-Temporal Attention (ST-Attention) module is built to distill insights from environmental variables. To improve the robustness and stability of the predictions, an encoder-decoder module that utilizes the ConvGPU unit is introduced to enhance feature maps. Then, a novel feature enhancement module is built to bolster the generalization capability and solve the dependency attenuation. The results demonstrate that the Deep-TCP framework significantly outperforms various benchmarks. Additionally, it effectively predicts multiple TC categories within the 6–24 h timeframe, showing strong capability in predicting changing trends. The reliable prediction results are potentially beneficial for disaster management and urban planning, significantly enhancing urban sustainability by improving preparedness and response strategies.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102670"},"PeriodicalIF":14.7,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SFGCN: Synergetic fusion-based graph convolutional networks approach for link prediction in social networks SFGCN：基于协同融合的图卷积网络社交网络链接预测方法

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-07 DOI: 10.1016/j.inffus.2024.102684

Sang-Woong Lee , Jawad Tanveer , Amir Masoud Rahmani , Hamid Alinejad-Rokny , Parisa Khoshvaght , Gholamreza Zare , Pegah Malekpour Alamdari , Mehdi Hosseinzadeh

Accurate Link Prediction (LP) in Social Networks (SNs) is crucial for various practical applications, such as recommendation systems and network security. However, traditional techniques often struggle to capture the intricate and multidimensional nature of these networks. This paper presents a novel approach, the Synergetic Fusion-based Graph Convolutional Networks (SFGCN), designed to enhance LP accuracy in SNs. The SFGCN model utilizes a fusion architecture that combines structural features and other attribute data through early, intermediate, and late fusion mechanisms to create improved node and edge representations. We thoroughly evaluate our SFGCN model on seven real-world datasets, encompassing citation networks, co-purchase networks, and academic publication domains. The results demonstrate its superiority over baseline GCN architectures and other selected LP methods, achieving a 6.88 % improvement in accuracy. The experiments show that our model captures the complex interactions and dependencies within SNs, providing a comprehensive understanding of their underlying dynamics. The approach mentioned can be effectively applied in the domain of SN analysis to enhance the accuracy of LP results. This method not only improves the precision of predictions but also enhances the adaptability of the model in diverse SN scenarios.

社交网络（SN）中准确的链接预测（LP）对于推荐系统和网络安全等各种实际应用至关重要。然而，传统技术往往难以捕捉这些网络错综复杂的多维特性。本文提出了一种新方法--基于协同融合的图卷积网络（SFGCN），旨在提高社交网络中的 LP 精度。SFGCN 模型采用融合架构，通过早期、中期和晚期融合机制将结构特征和其他属性数据结合起来，从而创建改进的节点和边缘表示。我们在七个实际数据集上对 SFGCN 模型进行了全面评估，这些数据集包括引用网络、共同购买网络和学术出版领域。结果表明，该模型优于基线 GCN 架构和其他选定的 LP 方法，准确率提高了 6.88%。实验表明，我们的模型捕捉到了 SN 内部复杂的交互和依赖关系，提供了对其潜在动态的全面理解。上述方法可有效应用于 SN 分析领域，以提高 LP 结果的准确性。这种方法不仅提高了预测的精确度，还增强了模型在不同SN情况下的适应性。

{"title":"SFGCN: Synergetic fusion-based graph convolutional networks approach for link prediction in social networks","authors":"Sang-Woong Lee , Jawad Tanveer , Amir Masoud Rahmani , Hamid Alinejad-Rokny , Parisa Khoshvaght , Gholamreza Zare , Pegah Malekpour Alamdari , Mehdi Hosseinzadeh","doi":"10.1016/j.inffus.2024.102684","DOIUrl":"10.1016/j.inffus.2024.102684","url":null,"abstract":"<div><p>Accurate Link Prediction (LP) in Social Networks (SNs) is crucial for various practical applications, such as recommendation systems and network security. However, traditional techniques often struggle to capture the intricate and multidimensional nature of these networks. This paper presents a novel approach, the Synergetic Fusion-based Graph Convolutional Networks (SFGCN), designed to enhance LP accuracy in SNs. The SFGCN model utilizes a fusion architecture that combines structural features and other attribute data through early, intermediate, and late fusion mechanisms to create improved node and edge representations. We thoroughly evaluate our SFGCN model on seven real-world datasets, encompassing citation networks, co-purchase networks, and academic publication domains. The results demonstrate its superiority over baseline GCN architectures and other selected LP methods, achieving a 6.88 % improvement in accuracy. The experiments show that our model captures the complex interactions and dependencies within SNs, providing a comprehensive understanding of their underlying dynamics. The approach mentioned can be effectively applied in the domain of SN analysis to enhance the accuracy of LP results. This method not only improves the precision of predictions but also enhances the adaptability of the model in diverse SN scenarios.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102684"},"PeriodicalIF":14.7,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142228969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Less is more: A closer look at semantic-based few-shot learning 少即是多：近距离观察基于语义的少量学习

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-07 DOI: 10.1016/j.inffus.2024.102672

Chunpeng Zhou , Zhi Yu , Xilu Yuan , Sheng Zhou , Jiajun Bu , Haishuai Wang

Few-shot Learning (FSL) aims to learn and distinguish new categories from a scant number of available samples, presenting a significant challenge in the realm of deep learning. Recent researchers have sought to leverage the additional semantic or linguistic information of scarce categories with a pre-trained language model to facilitate learning, thus partially alleviating the problem of insufficient supervision signals. Nonetheless, the full potential of the semantic information and pre-trained language model have been underestimated in the few-shot learning till now, resulting in limited performance enhancements. To address this, we propose a straightforward and efficacious framework for few-shot learning tasks, specifically designed to exploit the semantic information and language model. Specifically, we explicitly harness the zero-shot capability of the pre-trained language model with learnable prompts. And we directly add the visual feature with the textual feature for inference without the intricate designed fusion modules as in prior studies. Additionally, we apply the self-ensemble and distillation to further enhance performance. Extensive experiments conducted across four widely used few-shot datasets demonstrate that our simple framework achieves impressive results. Particularly noteworthy is its outstanding performance in the 1-shot learning task, surpassing the current state-of-the-art by an average of 3.3% in classification accuracy. Our code will be available at https://github.com/zhouchunpong/SimpleFewShot.

稀缺类别学习（Few-shot Learning，FSL）旨在从数量稀少的可用样本中学习和区分新类别，这对深度学习领域提出了重大挑战。最近的研究人员试图利用稀缺类别的额外语义或语言信息与预先训练的语言模型来促进学习，从而部分缓解监督信号不足的问题。然而，迄今为止，语义信息和预训练语言模型的全部潜力在少量学习中一直被低估，导致性能提升有限。为了解决这个问题，我们提出了一个直接有效的框架，专门用于利用语义信息和语言模型来完成少量学习任务。具体来说，我们通过可学习的提示明确利用了预训练语言模型的零点学习能力。我们直接将视觉特征与文本特征相结合进行推理，而无需像之前的研究那样设计复杂的融合模块。此外，我们还应用了自组装和蒸馏技术来进一步提高性能。在四个广泛使用的少量照片数据集上进行的广泛实验表明，我们的简单框架取得了令人印象深刻的成果。尤其值得一提的是，它在单次学习任务中表现出色，分类准确率平均超过当前最先进水平 3.3%。我们的代码将发布在 https://github.com/zhouchunpong/SimpleFewShot 网站上。

{"title":"Less is more: A closer look at semantic-based few-shot learning","authors":"Chunpeng Zhou , Zhi Yu , Xilu Yuan , Sheng Zhou , Jiajun Bu , Haishuai Wang","doi":"10.1016/j.inffus.2024.102672","DOIUrl":"10.1016/j.inffus.2024.102672","url":null,"abstract":"<div><p>Few-shot Learning (FSL) aims to learn and distinguish new categories from a scant number of available samples, presenting a significant challenge in the realm of deep learning. Recent researchers have sought to leverage the additional semantic or linguistic information of scarce categories with a pre-trained language model to facilitate learning, thus partially alleviating the problem of insufficient supervision signals. Nonetheless, the full potential of the semantic information and pre-trained language model have been underestimated in the few-shot learning till now, resulting in limited performance enhancements. To address this, we propose a straightforward and efficacious framework for few-shot learning tasks, specifically designed to exploit the semantic information and language model. Specifically, we explicitly harness the zero-shot capability of the pre-trained language model with learnable prompts. And we directly add the visual feature with the textual feature for inference without the intricate designed fusion modules as in prior studies. Additionally, we apply the self-ensemble and distillation to further enhance performance. Extensive experiments conducted across four widely used few-shot datasets demonstrate that our simple framework achieves impressive results. Particularly noteworthy is its outstanding performance in the 1-shot learning task, surpassing the current state-of-the-art by an average of 3.3% in classification accuracy. Our code will be available at <span><span>https://github.com/zhouchunpong/SimpleFewShot</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102672"},"PeriodicalIF":14.7,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Zero-shot sim-to-real transfer using Siamese-Q-Based reinforcement learning 利用基于 Siamese-Q 的强化学习实现模拟到现实的零点转移

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-06 DOI: 10.1016/j.inffus.2024.102664

Zhenyu Zhang, Shaorong Xie, Han Zhang, Xiangfeng Luo, Hang Yu

To address real world decision problems in reinforcement learning, it is common to train a policy in a simulator first for safety. Unfortunately, the sim-real gap hinders effective simulation-to-real transfer without substantial training data. However, collecting real samples of complex tasks is often impractical, and the sample inefficiency of reinforcement learning exacerbates the simulation-to-real problem, even with online interaction or data. Representation learning can improve sample efficiency while keeping generalization by projecting high-dimensional inputs into low-dimensional representations. However, whether trained independently or simultaneously with reinforcement learning, representation learning remains a separate auxiliary task, lacking task-related features and generalization for simulation-to-real transfer. This paper proposes Siamese-Q, a new representation learning method employing Siamese networks and zero-shot simulation-to-real transfer, which narrows the distance between inputs with the same semantics in the latent space with respect to Q values. This allows us to fuse task-related information into the representation and improve the generalization of the policy. Evaluation in virtual and real autonomous vehicle scenarios demonstrates substantial improvements of 19.5% and 94.2% respectively over conventional representation learning, without requiring any real-world observations or on-policy interaction, and enabling reinforcement learning policies trained in simulations transfer to reality.

为了解决强化学习中的现实决策问题，通常先在模拟器中训练策略以确保安全。遗憾的是，在没有大量训练数据的情况下，模拟与现实之间的差距阻碍了从模拟到现实的有效转换。然而，收集复杂任务的真实样本往往是不切实际的，而且强化学习的样本低效加剧了模拟到现实的问题，即使有在线交互或数据也是如此。表征学习可以通过将高维输入投射到低维表征中来提高样本效率，同时保持泛化。然而，无论是独立训练还是与强化学习同时训练，表征学习仍然是一项独立的辅助任务，缺乏与任务相关的特征和泛化，无法实现从模拟到现实的转移。本文提出的 Siamese-Q 是一种新的表征学习方法，它采用了连体网络和零点模拟到真实传输，缩小了潜空间中具有相同语义的输入与 Q 值之间的距离。这样，我们就能将与任务相关的信息融合到表征中，提高策略的通用性。在虚拟和真实的自动驾驶汽车场景中进行的评估表明，与传统的表示学习相比，该方法的性能分别提高了19.5%和94.2%，而且不需要任何现实世界的观察或政策互动，还能将模拟中训练的强化学习政策迁移到现实中。

{"title":"Zero-shot sim-to-real transfer using Siamese-Q-Based reinforcement learning","authors":"Zhenyu Zhang, Shaorong Xie, Han Zhang, Xiangfeng Luo, Hang Yu","doi":"10.1016/j.inffus.2024.102664","DOIUrl":"10.1016/j.inffus.2024.102664","url":null,"abstract":"<div><p>To address real world decision problems in reinforcement learning, it is common to train a policy in a simulator first for safety. Unfortunately, the sim-real gap hinders effective simulation-to-real transfer without substantial training data. However, collecting real samples of complex tasks is often impractical, and the sample inefficiency of reinforcement learning exacerbates the simulation-to-real problem, even with online interaction or data. Representation learning can improve sample efficiency while keeping generalization by projecting high-dimensional inputs into low-dimensional representations. However, whether trained independently or simultaneously with reinforcement learning, representation learning remains a separate auxiliary task, lacking task-related features and generalization for simulation-to-real transfer. This paper proposes Siamese-Q, a new representation learning method employing Siamese networks and zero-shot simulation-to-real transfer, which narrows the distance between inputs with the same semantics in the latent space with respect to Q values. This allows us to fuse task-related information into the representation and improve the generalization of the policy. Evaluation in virtual and real autonomous vehicle scenarios demonstrates substantial improvements of 19.5% and 94.2% respectively over conventional representation learning, without requiring any real-world observations or on-policy interaction, and enabling reinforcement learning policies trained in simulations transfer to reality.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102664"},"PeriodicalIF":14.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142158446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal manifold learning using kernel interpolation along geodesic paths 利用沿大地路径的核插值进行多模态流形学习

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-06 DOI: 10.1016/j.inffus.2024.102637

Ori Katz , Roy R. Lederman , Ronen Talmon

In this paper, we present a new spectral analysis and a low-dimensional embedding of two aligned multimodal datasets. Our approach combines manifold learning with the Riemannian geometry of symmetric and positive-definite (SPD) matrices. Manifold learning typically includes the spectral analysis of a single kernel matrix corresponding to a single dataset or a concatenation of several datasets. Here, we use the Riemannian geometry of SPD matrices to devise an interpolation scheme for combining two kernel matrices corresponding to two, possibly multimodal, datasets. We study the way the spectra of the kernels change along geodesic paths on the manifold of SPD matrices. We show that this change enables us, in a purely unsupervised manner, to derive an informative spectral representation of the relations between the two datasets. Based on this representation, we propose a new multimodal manifold learning method. We showcase the performance of the proposed spectral representation and manifold learning method using both simulations and real-measured data from multi-sensor industrial condition monitoring and artificial olfaction. We demonstrate that the proposed method achieves superior results compared to several baselines in terms of the truncated Dirichlet energy.

在本文中，我们提出了一种新的光谱分析方法，并对两个对齐的多模态数据集进行了低维嵌入。我们的方法将流形学习与对称和正有限（SPD）矩阵的黎曼几何相结合。流形学习通常包括与单个数据集或多个数据集的集合相对应的单个内核矩阵的频谱分析。在这里，我们利用 SPD 矩阵的黎曼几何原理设计了一种插值方案，用于组合对应于两个（可能是多模态）数据集的两个内核矩阵。我们研究了核谱沿着 SPD 矩阵流形上的测地路径发生变化的方式。我们的研究表明，这种变化使我们能够以纯粹无监督的方式，得出两个数据集之间关系的翔实光谱表示。基于这种表示，我们提出了一种新的多模态流形学习方法。我们利用多传感器工业状态监测和人工嗅觉的模拟和真实测量数据，展示了所提出的光谱表示和流形学习方法的性能。我们证明，就截断的 Dirichlet 能量而言，与几种基线方法相比，所提出的方法取得了更优越的结果。

{"title":"Multimodal manifold learning using kernel interpolation along geodesic paths","authors":"Ori Katz , Roy R. Lederman , Ronen Talmon","doi":"10.1016/j.inffus.2024.102637","DOIUrl":"10.1016/j.inffus.2024.102637","url":null,"abstract":"<div><p>In this paper, we present a new spectral analysis and a low-dimensional embedding of two aligned multimodal datasets. Our approach combines manifold learning with the Riemannian geometry of symmetric and positive-definite (SPD) matrices. Manifold learning typically includes the spectral analysis of a single kernel matrix corresponding to a single dataset or a concatenation of several datasets. Here, we use the Riemannian geometry of SPD matrices to devise an interpolation scheme for combining two kernel matrices corresponding to two, possibly multimodal, datasets. We study the way the spectra of the kernels change along geodesic paths on the manifold of SPD matrices. We show that this change enables us, in a purely unsupervised manner, to derive an informative spectral representation of the relations between the two datasets. Based on this representation, we propose a new multimodal manifold learning method. We showcase the performance of the proposed spectral representation and manifold learning method using both simulations and real-measured data from multi-sensor industrial condition monitoring and artificial olfaction. We demonstrate that the proposed method achieves superior results compared to several baselines in terms of the truncated Dirichlet energy.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102637"},"PeriodicalIF":14.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

New trends of adversarial machine learning for data fusion and intelligent system 用于数据融合和智能系统的对抗式机器学习新趋势

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-06 DOI: 10.1016/j.inffus.2024.102683

Weiping Ding , Zheng Zhang , Luis Martínez , Yu Huang , Zehong (Jimmy) Cao , Jun Liu , Abhirup Banerjee

引用次数: 0

Detecting Android malware: A multimodal fusion method with fine-grained feature 检测安卓恶意软件：具有细粒度特征的多模态融合方法

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-05 DOI: 10.1016/j.inffus.2024.102662

Xun Li , Lei Liu , Yuzhou Liu , Huaxiao Liu

Context: Recently, many studies have been proposed to address the threat posed by Android malware. However, the continuous evolution of malware poses challenges to the task of representing application features in current detection methods. Objective: This paper introduces a novel Android malware detection approach based on the source code and binary code of software by leveraging large pre-trained models with a fine-grained multimodal fusion strategy. Method: Specifically, the approach treats the source code and binary code as the programming language modality (PM) and machine language modality (MM), respectively. Then, domain-specific knowledge (sensitive API) combined with large pre-trained model is further applied to extract PM features; while the binary code is transformed into RGB images, from which MM features are extracted using a pre-trained image processing model. Furthermore, a fine-grained fusion strategy is implemented using a multi-head self-attention mechanism to effectively capture the correlations among features across different modalities and generate comprehensive features for application malware detection. Results and Conclusion: The detection performance and generalization ability of the proposed method were validated on two experimental datasets. The results demonstrate that our method can accurately distinguish malware, achieving an accuracy of 98.28% and an F1-score of 98.66%. Additionally, it performs well on unseen data, with an accuracy of 92.86% and an F1-score of 94.49%. Meanwhile, ablation experiments confirm the contributions of sensitive API knowledge and the fine-grained multimodal fusion strategy to the success of malware detection.

背景：最近，针对安卓恶意软件的威胁提出了许多研究。然而，恶意软件的不断演变给当前检测方法中的应用特征描述任务带来了挑战。目标本文介绍了一种基于软件源代码和二进制代码的新型安卓恶意软件检测方法，该方法利用大型预训练模型和细粒度多模态融合策略。方法：具体来说，该方法将源代码和二进制代码分别视为编程语言模态（PM）和机器语言模态（MM）。然后，将特定领域知识（敏感 API）与大型预训练模型相结合，进一步应用于提取 PM 特征；同时将二进制代码转换为 RGB 图像，并使用预训练图像处理模型从中提取 MM 特征。此外，还利用多头自注意机制实施了细粒度融合策略，以有效捕捉不同模态特征之间的相关性，并生成用于应用恶意软件检测的综合特征。结果与结论：在两个实验数据集上验证了所提方法的检测性能和泛化能力。结果表明，我们的方法能准确区分恶意软件，准确率达到 98.28%，F1 分数达到 98.66%。此外，它在未见过的数据上也表现出色，准确率为 92.86%，F1 分数为 94.49%。同时，消融实验证实了敏感 API 知识和细粒度多模态融合策略对成功检测恶意软件的贡献。

{"title":"Detecting Android malware: A multimodal fusion method with fine-grained feature","authors":"Xun Li , Lei Liu , Yuzhou Liu , Huaxiao Liu","doi":"10.1016/j.inffus.2024.102662","DOIUrl":"10.1016/j.inffus.2024.102662","url":null,"abstract":"<div><p>Context: Recently, many studies have been proposed to address the threat posed by Android malware. However, the continuous evolution of malware poses challenges to the task of representing application features in current detection methods. Objective: This paper introduces a novel Android malware detection approach based on the source code and binary code of software by leveraging large pre-trained models with a fine-grained multimodal fusion strategy. Method: Specifically, the approach treats the source code and binary code as the programming language modality (PM) and machine language modality (MM), respectively. Then, domain-specific knowledge (sensitive API) combined with large pre-trained model is further applied to extract PM features; while the binary code is transformed into RGB images, from which MM features are extracted using a pre-trained image processing model. Furthermore, a fine-grained fusion strategy is implemented using a multi-head self-attention mechanism to effectively capture the correlations among features across different modalities and generate comprehensive features for application malware detection. Results and Conclusion: The detection performance and generalization ability of the proposed method were validated on two experimental datasets. The results demonstrate that our method can accurately distinguish malware, achieving an accuracy of 98.28% and an F1-score of 98.66%. Additionally, it performs well on unseen data, with an accuracy of 92.86% and an F1-score of 94.49%. Meanwhile, ablation experiments confirm the contributions of sensitive API knowledge and the fine-grained multimodal fusion strategy to the success of malware detection.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102662"},"PeriodicalIF":14.7,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142162359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey on occupancy perception for autonomous driving: The information fusion perspective 自动驾驶的乘员感知调查：信息融合视角

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-05 DOI: 10.1016/j.inffus.2024.102671

Huaiyuan Xu, Junliang Chen, Shiyu Meng, Yi Wang, Lap-Pui Chau

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird’s-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception. A comprehensive list of studies in this survey is publicly available in an active repository that continuously collects the latest work: https://github.com/HuaiyuanXu/3D-Occupancy-Perception.

三维空间感知技术旨在为自动驾驶汽车观察和理解密集的三维环境。由于其全面的感知能力，该技术正在成为自动驾驶感知系统的发展趋势，并引起了工业界和学术界的极大关注。与传统的鸟瞰（BEV）感知类似，三维占位感知具有多源输入和信息融合的性质。但不同的是，它能捕捉二维鸟瞰图所忽略的垂直结构。在本调查中，我们回顾了有关三维空间占用感知的最新研究成果，并对各种输入模式的方法进行了深入分析。具体来说，我们总结了一般的网络管道，强调了信息融合技术，并讨论了有效的网络训练。我们在最流行的数据集上评估和分析了最先进的占用感知性能。此外，我们还讨论了面临的挑战和未来的研究方向。我们希望这篇论文能对业界有所启发，并鼓励更多有关 3D 空间占用感知的研究工作。本调查中的全面研究清单可在一个持续收集最新研究成果的活跃资料库中公开获取：https://github.com/HuaiyuanXu/3D-Occupancy-Perception。

{"title":"A survey on occupancy perception for autonomous driving: The information fusion perspective","authors":"Huaiyuan Xu, Junliang Chen, Shiyu Meng, Yi Wang, Lap-Pui Chau","doi":"10.1016/j.inffus.2024.102671","DOIUrl":"10.1016/j.inffus.2024.102671","url":null,"abstract":"<div><p>3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird’s-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception. A comprehensive list of studies in this survey is publicly available in an active repository that continuously collects the latest work: <span><span>https://github.com/HuaiyuanXu/3D-Occupancy-Perception</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102671"},"PeriodicalIF":14.7,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142228690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Contemporary Survey on Multisource Information Fusion for Smart Sustainable Cities: Emerging Trends and Persistent Challenges 多源信息融合促进智能可持续城市的当代调查：新趋势与长期挑战

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-04 DOI: 10.1016/j.inffus.2024.102667

Houda Orchi , Abdoulaye Baniré Diallo , Halima Elbiaze , Essaid Sabir , Mohamed Sadik

The emergence of smart sustainable cities has unveiled a wealth of data sources, each contributing to a vast array of urban applications. At the heart of managing this plethora of data is multisource information fusion (MSIF), a sophisticated approach that not only improves the quality of data collected from myriad sources, including sensors, satellites, social media, and citizen-generated content, but also aids in generating actionable insights crucial for sustainable urban management. Unlike simple data fusion, MSIF excels in harmonizing disparate data sources, effectively navigating through their variability, potential conflicts, and the challenges posed by incomplete datasets. This capability is essential for ensuring the integrity and utility of information, which supports comprehensive insights into urban systems and effective planning. This survey combines hierarchical and multi-dimensional classification to examine how MSIF integrates and analyses diverse datasets, enhancing the operational efficiency and intelligence of urban environments. It highlights the most significant challenges and opportunities presented by MSIF in smart sustainable cities, particularly how it overcomes the limitations of existing approaches in scope and coverage.

By considering social, economic, and environmental factors, MSIF offers a multidisciplinary approach that is pivotal for advancing sustainable urban development. Recognized as an essential resource for academics and practitioners, this study promotes a new wave of MSIF innovations aimed at improving the cohesion, efficiency, and sustainability of smart cities.

智能可持续城市的出现揭示了丰富的数据源，每种数据源都为大量城市应用做出了贡献。管理这些大量数据的核心是多源信息融合（MSIF），这是一种复杂的方法，不仅能提高从传感器、卫星、社交媒体和市民生成的内容等各种来源收集的数据的质量，还能帮助生成对可持续城市管理至关重要的可操作见解。与简单的数据融合不同，MSIF 擅长协调不同的数据源，有效克服数据源的可变性、潜在冲突以及不完整数据集带来的挑战。这种能力对于确保信息的完整性和实用性至关重要，有助于全面了解城市系统和有效规划。本调查结合了分层和多维分类，研究 MSIF 如何整合和分析不同的数据集，从而提高城市环境的运行效率和智能化程度。通过考虑社会、经济和环境因素，MSIF 提供了一种多学科方法，对推进城市可持续发展至关重要。作为学术界和实践者的重要资源，本研究推动了 MSIF 创新的新浪潮，旨在提高智慧城市的凝聚力、效率和可持续性。

{"title":"A Contemporary Survey on Multisource Information Fusion for Smart Sustainable Cities: Emerging Trends and Persistent Challenges","authors":"Houda Orchi , Abdoulaye Baniré Diallo , Halima Elbiaze , Essaid Sabir , Mohamed Sadik","doi":"10.1016/j.inffus.2024.102667","DOIUrl":"10.1016/j.inffus.2024.102667","url":null,"abstract":"<div><p>The emergence of smart sustainable cities has unveiled a wealth of data sources, each contributing to a vast array of urban applications. At the heart of managing this plethora of data is multisource information fusion (MSIF), a sophisticated approach that not only improves the quality of data collected from myriad sources, including sensors, satellites, social media, and citizen-generated content, but also aids in generating actionable insights crucial for sustainable urban management. Unlike simple data fusion, MSIF excels in harmonizing disparate data sources, effectively navigating through their variability, potential conflicts, and the challenges posed by incomplete datasets. This capability is essential for ensuring the integrity and utility of information, which supports comprehensive insights into urban systems and effective planning. This survey combines hierarchical and multi-dimensional classification to examine how MSIF integrates and analyses diverse datasets, enhancing the operational efficiency and intelligence of urban environments. It highlights the most significant challenges and opportunities presented by MSIF in smart sustainable cities, particularly how it overcomes the limitations of existing approaches in scope and coverage.</p><p>By considering social, economic, and environmental factors, MSIF offers a multidisciplinary approach that is pivotal for advancing sustainable urban development. Recognized as an essential resource for academics and practitioners, this study promotes a new wave of MSIF innovations aimed at improving the cohesion, efficiency, and sustainability of smart cities.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102667"},"PeriodicalIF":14.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scene understanding method utilizing global visual and spatial interaction features for safety production 利用全局视觉和空间交互特征的场景理解方法促进安全生产

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-09-04 DOI: 10.1016/j.inffus.2024.102668

Fuqi Ma , Bo Wang , Xuzhu Dong , Min Li , Hengrui Ma , Rong Jia , Amar Jain

Risk identification in power operations is crucial for both personal safety and power production. Existing risk identification methods mainly use target detection models to identify the common risks but the scene specificity of risk occurrence. For example, not wearing a safety harness, not wearing insulated gloves, etc. Since most methods for detecting safety gears make sense only under specific scene. But the power electric work is a complex object involving many elements such as personnel, equipment and safety tools. Therefore, this paper proposes a scene understanding method that integrates visual features and spatial relationship features among scene elements. This method constructs a scenean undirected scene graph to represent the interactive relationship among the elements, extracts the interactive features by using a graph encoder-decoder convolution module, and fuse perceived high-dimensional visual features and spatial topological features for scene recognition, in order to effectively solve addressing the power operation scene understanding problem under multi-element interaction. Finally, a power inspection operation scenario was chosen as the test case. The outcome of the evaluation indicates results indicate that the proposed approach suggested in this study exhibits superior precision in scene identification and shows ademonstrates strong generalization ability.

电力运行中的风险识别对人身安全和电力生产都至关重要。现有的风险识别方法主要使用目标检测模型来识别常见风险，但风险发生的现场特殊性。例如，未佩戴安全带、未佩戴绝缘手套等。由于大多数检测安全装备的方法只有在特定场景下才有意义。但电力电气工作是一个复杂的对象，涉及人员、设备和安全工具等诸多要素。因此，本文提出了一种综合视觉特征和场景元素间空间关系特征的场景理解方法。该方法通过构建场景无向图来表示元素间的交互关系，利用图编码器-解码器卷积模块提取交互特征，并融合感知到的高维视觉特征和空间拓扑特征进行场景识别，从而有效解决多元素交互下的电力作业场景理解问题。最后，选择了一个电力巡检作业场景作为测试案例。评估结果表明，本研究提出的方法在场景识别方面表现出卓越的精度，并显示出较强的泛化能力。

{"title":"Scene understanding method utilizing global visual and spatial interaction features for safety production","authors":"Fuqi Ma , Bo Wang , Xuzhu Dong , Min Li , Hengrui Ma , Rong Jia , Amar Jain","doi":"10.1016/j.inffus.2024.102668","DOIUrl":"10.1016/j.inffus.2024.102668","url":null,"abstract":"<div><p>Risk identification in power operations is crucial for both personal safety and power production. Existing risk identification methods mainly use target detection models to identify the common risks but the scene specificity of risk occurrence. For example, not wearing a safety harness, not wearing insulated gloves, etc. Since most methods for detecting safety gears make sense only under specific scene. But the power electric work is a complex object involving many elements such as personnel, equipment and safety tools. Therefore, this paper proposes a scene understanding method that integrates visual features and spatial relationship features among scene elements. This method constructs a scenean undirected scene graph to represent the interactive relationship among the elements, extracts the interactive features by using a graph encoder-decoder convolution module, and fuse perceived high-dimensional visual features and spatial topological features for scene recognition, in order to effectively solve addressing the power operation scene understanding problem under multi-element interaction. Finally, a power inspection operation scenario was chosen as the test case. The outcome of the evaluation indicates results indicate that the proposed approach suggested in this study exhibits superior precision in scene identification and shows ademonstrates strong generalization ability.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102668"},"PeriodicalIF":14.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0