首页 > 最新文献

Information Fusion最新文献

英文 中文
Images, normal maps and point clouds fusion decoder for 6D pose estimation 图像,法线贴图和点云融合解码器的6D姿态估计
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 DOI: 10.1016/j.inffus.2024.102907
Hong-Bo Zhang, Jia-Xin Hong, Jing-Hua Liu, Qing Lei, Ji-Xiang Du
6D pose estimation plays a crucial role in enabling intelligent robots to interact with their environment by understanding 3D scene information. This task is challenging due to factors such as texture-less objects, illumination variations, and scene occlusions. In this work, we present a novel approach that integrates feature fusion from multiple data modalities—specifically, RGB images, normal maps, and point clouds—to enhance the accuracy of 6D pose estimation. Unlike previous methods that rely solely on RGB-D data or focus on either shallow or deep feature fusion, the proposed method uniquely incorporates both shallow and deep feature fusion across heterogeneous modalities, compensating for the information often lost in point clouds. Specifically, the proposed method includes an adaptive feature fusion module designed to improve the communication and fusion of shallow features between RGB images and normal maps. Additionally, a multi-modal fusion decoder is implemented to facilitate cross-modal feature fusion between image and point cloud data. Experimental results demonstrate that the proposed method achieves state-of-the-art performance, with 6D pose estimation accuracy reaching 97.7% on the Linemod dataset, 71.5% on the Occlusion Linemod dataset, and 95.8% on the YCB-Video dataset. These results underline the robustness and effectiveness of the proposed approach in complex environments.
6D姿态估计在智能机器人通过理解3D场景信息与环境交互方面起着至关重要的作用。由于诸如无纹理物体、光照变化和场景遮挡等因素,这项任务具有挑战性。在这项工作中,我们提出了一种新的方法,该方法集成了来自多种数据模式的特征融合,特别是RGB图像、法线图和点云,以提高6D姿态估计的准确性。与以往仅依赖RGB-D数据或只关注浅层或深层特征融合的方法不同,该方法独特地融合了跨异构模式的浅层和深层特征融合,弥补了点云中经常丢失的信息。具体而言,该方法包括一个自适应特征融合模块,旨在改善RGB图像与法线贴图之间的浅特征通信和融合。此外,还实现了多模态融合解码器,以促进图像和点云数据之间的跨模态特征融合。实验结果表明,该方法达到了最先进的性能,在Linemod数据集上6D姿态估计精度达到97.7%,在遮挡线emod数据集上达到71.5%,在ybc - video数据集上达到95.8%。这些结果强调了所提出的方法在复杂环境中的鲁棒性和有效性。
{"title":"Images, normal maps and point clouds fusion decoder for 6D pose estimation","authors":"Hong-Bo Zhang, Jia-Xin Hong, Jing-Hua Liu, Qing Lei, Ji-Xiang Du","doi":"10.1016/j.inffus.2024.102907","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102907","url":null,"abstract":"6D pose estimation plays a crucial role in enabling intelligent robots to interact with their environment by understanding 3D scene information. This task is challenging due to factors such as texture-less objects, illumination variations, and scene occlusions. In this work, we present a novel approach that integrates feature fusion from multiple data modalities—specifically, RGB images, normal maps, and point clouds—to enhance the accuracy of 6D pose estimation. Unlike previous methods that rely solely on RGB-D data or focus on either shallow or deep feature fusion, the proposed method uniquely incorporates both shallow and deep feature fusion across heterogeneous modalities, compensating for the information often lost in point clouds. Specifically, the proposed method includes an adaptive feature fusion module designed to improve the communication and fusion of shallow features between RGB images and normal maps. Additionally, a multi-modal fusion decoder is implemented to facilitate cross-modal feature fusion between image and point cloud data. Experimental results demonstrate that the proposed method achieves state-of-the-art performance, with 6D pose estimation accuracy reaching 97.7% on the Linemod dataset, 71.5% on the Occlusion Linemod dataset, and 95.8% on the YCB-Video dataset. These results underline the robustness and effectiveness of the proposed approach in complex environments.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"126 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VCOS: Multi-scale information fusion to feature selection using fuzzy rough combination entropy VCOS:基于模糊粗糙组合熵的多尺度信息融合特征选择
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-31 DOI: 10.1016/j.inffus.2024.102901
Binbin Sang, Lei Yang, Weihua Xu, Hongmei Chen, Tianrui Li, Wentao Li
Multi-scale information fusion has attracted extensive attention in data mining, in which the optimal scale combination principles and feature selection algorithms are two core issues. However, the traditional optimal scale combination is obtained by satisfying the consistency of the conditional feature scales with the decision classification. This consistency principle is too strict and not fault-tolerant. It leads to the knowledge granularity being too fine, is likely to reduce the feature selection algorithms performance, and does not meet the needs of practical applications. Therefore, this paper develops a novel optimal scale combination selection method to fuse multi-scale information, establishes a new fuzzy rough set model, defines uncertainty measures, and designs a feature selection algorithm for Multi-scale Fuzzy Decision Systems (MsFDSs). First, the Variable-Consistency Optimal Scale (VCOS) selection principle is defined by introducing the variable-consistency rate. The VCOS-based fuzzy rough set model is proposed, a derived uncertainty measure based on this model is defined as well as related properties are proved. Then, the VCOS-based Fuzzy Rough Combinatorial Entropy (VCOS-FRCE) is defined, and its monotonicity with respect to the feature subsets and the variable consistency rate is proved, respectively. Finally, we define the relative reduct principle and the significance of features based on VCOS-FRCE and design a forward greedy multi-scale feature selection algorithm. Our proposed VCOS-based multi-scale fusion method can adjust the consistency degree between knowledge granules and decision classification according to actual needs. This multi-scale information fusion method has better generalization and can be applied to various complex data. The performance of the multi-scale feature selection method developed based on this method is also further improved. Experiments are performed on twelve public datasets from UCI, and the proposed algorithm is compared with eight existing algorithms. The experimental results show that the proposed algorithm can effectively remove redundant features and improve the classification performance.
多尺度信息融合在数据挖掘领域受到广泛关注,其中最优尺度组合原则和特征选择算法是两个核心问题。而传统的最优尺度组合是通过满足条件特征尺度与决策分类的一致性来实现的。这种一致性原则过于严格,不能容错。它导致知识粒度过细,容易降低特征选择算法的性能,不符合实际应用的需要。为此,本文提出了一种新的融合多尺度信息的最优尺度组合选择方法,建立了新的模糊粗糙集模型,定义了不确定性测度,设计了多尺度模糊决策系统(msfds)的特征选择算法。首先,通过引入变一致性率,定义了变一致性最优尺度的选择原则;提出了基于vcos的模糊粗糙集模型,定义了基于该模型的派生不确定性测度,并证明了相关性质。然后,定义了基于模糊粗糙组合熵的模糊粗糙组合熵(VCOS-FRCE),并分别证明了其对特征子集的单调性和变量一致性率。最后,定义了基于VCOS-FRCE的特征相对约简原则和特征意义,设计了一种前向贪婪多尺度特征选择算法。我们提出的基于vcos的多尺度融合方法可以根据实际需要调整知识颗粒与决策分类之间的一致性。这种多尺度信息融合方法具有较好的泛化性,可以应用于各种复杂数据。在此基础上开发的多尺度特征选择方法的性能也得到了进一步提高。在UCI的12个公共数据集上进行了实验,并与现有的8种算法进行了比较。实验结果表明,该算法能够有效地去除冗余特征,提高分类性能。
{"title":"VCOS: Multi-scale information fusion to feature selection using fuzzy rough combination entropy","authors":"Binbin Sang, Lei Yang, Weihua Xu, Hongmei Chen, Tianrui Li, Wentao Li","doi":"10.1016/j.inffus.2024.102901","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102901","url":null,"abstract":"Multi-scale information fusion has attracted extensive attention in data mining, in which the optimal scale combination principles and feature selection algorithms are two core issues. However, the traditional optimal scale combination is obtained by satisfying the consistency of the conditional feature scales with the decision classification. This consistency principle is too strict and not fault-tolerant. It leads to the knowledge granularity being too fine, is likely to reduce the feature selection algorithms performance, and does not meet the needs of practical applications. Therefore, this paper develops a novel optimal scale combination selection method to fuse multi-scale information, establishes a new fuzzy rough set model, defines uncertainty measures, and designs a feature selection algorithm for Multi-scale Fuzzy Decision Systems (MsFDSs). First, the Variable-Consistency Optimal Scale (VCOS) selection principle is defined by introducing the variable-consistency rate. The VCOS-based fuzzy rough set model is proposed, a derived uncertainty measure based on this model is defined as well as related properties are proved. Then, the VCOS-based Fuzzy Rough Combinatorial Entropy (VCOS-FRCE) is defined, and its monotonicity with respect to the feature subsets and the variable consistency rate is proved, respectively. Finally, we define the relative reduct principle and the significance of features based on VCOS-FRCE and design a forward greedy multi-scale feature selection algorithm. Our proposed VCOS-based multi-scale fusion method can adjust the consistency degree between knowledge granules and decision classification according to actual needs. This multi-scale information fusion method has better generalization and can be applied to various complex data. The performance of the multi-scale feature selection method developed based on this method is also further improved. Experiments are performed on twelve public datasets from UCI, and the proposed algorithm is compared with eight existing algorithms. The experimental results show that the proposed algorithm can effectively remove redundant features and improve the classification performance.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"27 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Obfuscation-resilient detection of Android third-party libraries using multi-scale code dependency fusion 基于多尺度代码依赖融合的Android第三方库抗混淆检测
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-31 DOI: 10.1016/j.inffus.2024.102908
Zhao Zhang, Senlin Luo, Yongxin Lu, Limin Pan
Third-Party Library (TPL) detection is a crucial aspect of Android application security assessment, but it faces significant challenges due to code obfuscation. Existing methods often rely on single-scale features, such as class dependencies or instruction opcodes. This reliance can overlook critical dependencies, leading to incomplete library representation and reduced detection recall. Furthermore, the high similarity between a TPL and its adjacent versions causes overlaps in the feature space, reducing the accuracy of version identification. To address these limitations, we propose LibMD, a multi-scale code dependency fusion approach for TPL detection in Android apps. LibMD enhances library code representation by combining class reference syntax augmentation, cross-scale function mapping, and control flow reconstruction of basic blocks. It also extracts metadata dependencies and constructs a library dependency graph that integrates app-code similarity with multiple libraries. By applying Bayes’ theorem to compute posterior probabilities, LibMD effectively evaluates the likelihood of TPL integration and improves the precision of library version identification. Experimental results demonstrate that LibMD outperforms state-of-the-art methods across diverse datasets, achieving robust TPL detection and accurate version identification, even under various obfuscation techniques.
第三方库(TPL)检测是Android应用程序安全评估的一个重要方面,但由于代码混淆,它面临着重大挑战。现有的方法通常依赖于单尺度特征,如类依赖或指令操作码。这种依赖可能会忽略关键的依赖关系,导致库表示不完整并减少检测召回。此外,TPL与其相邻版本之间的高相似性会导致特征空间中的重叠,从而降低版本识别的准确性。为了解决这些限制,我们提出了LibMD,一种用于Android应用程序TPL检测的多尺度代码依赖融合方法。LibMD通过结合类引用语法增强、跨尺度函数映射和基本块的控制流重建来增强库代码表示。它还提取元数据依赖关系,并构建一个库依赖关系图,将应用程序代码的相似性与多个库集成在一起。LibMD通过贝叶斯定理计算后验概率,有效地评估了TPL集成的可能性,提高了库版本识别的精度。实验结果表明,即使在各种混淆技术下,LibMD在不同数据集上也优于最先进的方法,实现了鲁棒的TPL检测和准确的版本识别。
{"title":"Obfuscation-resilient detection of Android third-party libraries using multi-scale code dependency fusion","authors":"Zhao Zhang, Senlin Luo, Yongxin Lu, Limin Pan","doi":"10.1016/j.inffus.2024.102908","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102908","url":null,"abstract":"Third-Party Library (TPL) detection is a crucial aspect of Android application security assessment, but it faces significant challenges due to code obfuscation. Existing methods often rely on single-scale features, such as class dependencies or instruction opcodes. This reliance can overlook critical dependencies, leading to incomplete library representation and reduced detection recall. Furthermore, the high similarity between a TPL and its adjacent versions causes overlaps in the feature space, reducing the accuracy of version identification. To address these limitations, we propose LibMD, a multi-scale code dependency fusion approach for TPL detection in Android apps. LibMD enhances library code representation by combining class reference syntax augmentation, cross-scale function mapping, and control flow reconstruction of basic blocks. It also extracts metadata dependencies and constructs a library dependency graph that integrates app-code similarity with multiple libraries. By applying Bayes’ theorem to compute posterior probabilities, LibMD effectively evaluates the likelihood of TPL integration and improves the precision of library version identification. Experimental results demonstrate that LibMD outperforms state-of-the-art methods across diverse datasets, achieving robust TPL detection and accurate version identification, even under various obfuscation techniques.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"73 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modal-invariant progressive representation for multimodal image registration 多模态图像配准的模态不变渐进表示
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-31 DOI: 10.1016/j.inffus.2024.102903
Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li
Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.
许多应用,如自动驾驶,严重依赖于多模态数据。然而,由于分辨率、视角和光路结构的差异,导致多模态图像之间像素不对齐,从而导致融合结果失真和边缘伪影。除了广泛使用的手动校准之外,基于学习的方法通常采用两阶段注册过程,称为“翻译-然后注册”。然而,模式之间的差距使这种方法缺乏凝聚力。它在配准过程中引入了更多的不确定性,误导了不同位置的特征对齐,限制了变形场的精度。为了解决这些挑战,我们引入了模态不变渐进表示(MIPR)方法。MIPR背后的关键是将不同模态的特征解耦到基于频带的模态不变域,然后在多个特征尺度上进行渐进校正。具体来说,MIPR包括两个主要部分:场自适应融合(FAF)模块和渐进场估计(PFE)模块。FAF集成了所有以前的多尺度变形子场。PFE在不同尺度上逐步估计剩余的变形子场。此外,我们提出了端到端配准的两阶段预训练策略。我们的方法简单而稳健,在几个基准任务中获得了令人印象深刻的视觉结果,甚至超过了手动校准的基础事实,并推进了下游任务。
{"title":"Modal-invariant progressive representation for multimodal image registration","authors":"Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li","doi":"10.1016/j.inffus.2024.102903","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102903","url":null,"abstract":"Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"2 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed fusion filtering for multi-rate nonlinear systems with binary measurements under encryption and decryption scheme 加解密方案下二值测量多速率非线性系统的分布式融合滤波
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-27 DOI: 10.1016/j.inffus.2024.102900
Jun Hu, Shuting Fan, Raquel Caballero-Águila, Mingqing Zhu, Guangchen Zhang
This paper discusses the distributed fusion filtering problem for multi-rate nonlinear systems with binary measurements (BMs) based on an encryption and decryption scheme (EDS), in which the measurement outputs are represented by vectors with elements taking the values of 0 or 1. The expectation of the BMs is described by the cumulative distribution function of the standard normal distribution, where a newly defined random variable is utilized for reconstructing the BMs model. In order to ensure information security, the EDS is introduced in the data transmission process among the sensor nodes. Based on the information obtained, the local distributed filtering algorithm is proposed to obtain an upper bound on the local filtering error covariance, and the local filter gain is designed to minimize the resulting upper bound. In addition, the fusion filter is obtained with the parallel covariance intersection fusion criterion and the filtering performance is analyzed in terms of boundedness with theoretical proof. Finally, a target tracking experiment is taken to show the effectiveness and applicability of the proposed fusion filtering scheme.
本文讨论了基于加解密方案(EDS)的具有二值测量的多速率非线性系统的分布式融合滤波问题,其中测量输出用元素为0或1的向量表示。用标准正态分布的累积分布函数来描述脑转移的期望,其中利用一个新定义的随机变量来重建脑转移模型。为了保证信息安全,在传感器节点之间的数据传输过程中引入了EDS。根据得到的信息,提出局部分布式滤波算法,求出局部滤波误差协方差的上界,并设计局部滤波增益,使所得上界最小。此外,利用平行协方差相交融合准则得到了融合滤波器,并从有界性的角度对滤波性能进行了分析,并给出了理论证明。最后,通过目标跟踪实验验证了所提融合滤波方案的有效性和适用性。
{"title":"Distributed fusion filtering for multi-rate nonlinear systems with binary measurements under encryption and decryption scheme","authors":"Jun Hu, Shuting Fan, Raquel Caballero-Águila, Mingqing Zhu, Guangchen Zhang","doi":"10.1016/j.inffus.2024.102900","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102900","url":null,"abstract":"This paper discusses the distributed fusion filtering problem for multi-rate nonlinear systems with binary measurements (BMs) based on an encryption and decryption scheme (EDS), in which the measurement outputs are represented by vectors with elements taking the values of 0 or 1. The expectation of the BMs is described by the cumulative distribution function of the standard normal distribution, where a newly defined random variable is utilized for reconstructing the BMs model. In order to ensure information security, the EDS is introduced in the data transmission process among the sensor nodes. Based on the information obtained, the local distributed filtering algorithm is proposed to obtain an upper bound on the local filtering error covariance, and the local filter gain is designed to minimize the resulting upper bound. In addition, the fusion filter is obtained with the parallel covariance intersection fusion criterion and the filtering performance is analyzed in terms of boundedness with theoretical proof. Finally, a target tracking experiment is taken to show the effectiveness and applicability of the proposed fusion filtering scheme.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"2 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on multi-view fusion for predicting links in biomedical bipartite networks: Methods and applications 生物医学二部网络链接预测的多视图融合研究:方法与应用
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-24 DOI: 10.1016/j.inffus.2024.102894
Yuqing Qian, Yizheng Wang, Junkai Liu, Quan Zou, Yijie Ding, Xiaoyi Guo, Weiping Ding
Biomedical research increasingly relies on the analysis of complex interactions between biological entities, such as genes, proteins, and drugs. Although advancements in biomedical technologies have led to a vast accumulation of relational data, the high cost and time demands of wet-lab experiments have limited the number of verified interactions. Thus, computational methods have become essential for predicting potential links by leveraging diverse datasets to efficiently and accurately identify promising interactions. Multi-view fusion, which combines complementary information from multiple sources, has shown significant promise for enhancing the prediction accuracy and robustness. We introduce the framework of multi-view fusion methods by elaborating on key components. This includes a comprehensive examination of multi-view data sources covering various omics and biological databases. We then describe the feature extraction techniques and explore how meaningful features can be derived from heterogeneous data formats. Next, we offer an in-depth review of the fusion strategies and categorize them as early fusion, late fusion, and fusion during the training phase. We discuss the advantages and limitations of each approach, emphasizing the need for sophisticated techniques that consider the unique attributes of biological link prediction. We also provide an overview of the commonly used datasets, evaluation metrics, and validation techniques. Commonly used datasets serve as reliable benchmarks for evaluating the computational models. Evaluation metrics and validation techniques are crucial for reliably assessing the performances of link prediction models. Subsequently, a comparative analysis of different fusion methods is conducted to empirically evaluate their performances on widely available biomedical datasets. This yielded valuable insights into the strengths and limitations of each approach in real-world applications. Finally, we identify key obstacles such as data heterogeneity, model robustness, and missing data and suggest potential directions for future research. Our findings offer valuable insights into the applications and future directions of multi-view fusion methods for biomedical link prediction, highlighting their potential to accelerate discovery and innovation in the field.
生物医学研究越来越依赖于对生物实体(如基因、蛋白质和药物)之间复杂相互作用的分析。尽管生物医学技术的进步导致了大量相关数据的积累,但湿实验室实验的高成本和时间要求限制了验证相互作用的数量。因此,通过利用不同的数据集有效、准确地识别有希望的相互作用,计算方法已经成为预测潜在联系的关键。多视图融合将多源互补信息结合在一起,在提高预测精度和鲁棒性方面具有重要的前景。通过对关键组件的阐述,介绍了多视图融合方法的框架。这包括对涵盖各种组学和生物学数据库的多视图数据源的全面检查。然后,我们描述了特征提取技术,并探讨了如何从异构数据格式中获得有意义的特征。接下来,我们对融合策略进行了深入的回顾,并将其分为早期融合、晚期融合和训练阶段的融合。我们讨论了每种方法的优点和局限性,强调需要考虑生物链接预测的独特属性的复杂技术。我们还概述了常用的数据集、评估指标和验证技术。常用的数据集可作为评估计算模型的可靠基准。评估指标和验证技术对于可靠地评估链路预测模型的性能至关重要。随后,对不同的融合方法进行了比较分析,以经验评估其在广泛可用的生物医学数据集上的性能。这产生了对每种方法在实际应用程序中的优点和局限性的有价值的见解。最后,我们确定了关键障碍,如数据异质性、模型稳健性和缺失数据,并提出了未来研究的潜在方向。我们的研究结果为多视角融合方法在生物医学链接预测中的应用和未来方向提供了有价值的见解,突出了它们在加速该领域发现和创新方面的潜力。
{"title":"A survey on multi-view fusion for predicting links in biomedical bipartite networks: Methods and applications","authors":"Yuqing Qian, Yizheng Wang, Junkai Liu, Quan Zou, Yijie Ding, Xiaoyi Guo, Weiping Ding","doi":"10.1016/j.inffus.2024.102894","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102894","url":null,"abstract":"Biomedical research increasingly relies on the analysis of complex interactions between biological entities, such as genes, proteins, and drugs. Although advancements in biomedical technologies have led to a vast accumulation of relational data, the high cost and time demands of wet-lab experiments have limited the number of verified interactions. Thus, computational methods have become essential for predicting potential links by leveraging diverse datasets to efficiently and accurately identify promising interactions. Multi-view fusion, which combines complementary information from multiple sources, has shown significant promise for enhancing the prediction accuracy and robustness. We introduce the framework of multi-view fusion methods by elaborating on key components. This includes a comprehensive examination of multi-view data sources covering various omics and biological databases. We then describe the feature extraction techniques and explore how meaningful features can be derived from heterogeneous data formats. Next, we offer an in-depth review of the fusion strategies and categorize them as early fusion, late fusion, and fusion during the training phase. We discuss the advantages and limitations of each approach, emphasizing the need for sophisticated techniques that consider the unique attributes of biological link prediction. We also provide an overview of the commonly used datasets, evaluation metrics, and validation techniques. Commonly used datasets serve as reliable benchmarks for evaluating the computational models. Evaluation metrics and validation techniques are crucial for reliably assessing the performances of link prediction models. Subsequently, a comparative analysis of different fusion methods is conducted to empirically evaluate their performances on widely available biomedical datasets. This yielded valuable insights into the strengths and limitations of each approach in real-world applications. Finally, we identify key obstacles such as data heterogeneity, model robustness, and missing data and suggest potential directions for future research. Our findings offer valuable insights into the applications and future directions of multi-view fusion methods for biomedical link prediction, highlighting their potential to accelerate discovery and innovation in the field.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"9 1 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OmniFuse: A general modality fusion framework for multi-modality learning on low-quality medical data OmniFuse:用于低质量医疗数据的多模态学习的通用模态融合框架
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-24 DOI: 10.1016/j.inffus.2024.102890
Yixuan Wu, Jintai Chen, Lianting Hu, Hongxia Xu, Huiying Liang, Jian Wu
Mirroring the practice of human medical experts, the integration of diverse medical examination modalities enhances the performance of predictive models in clinical settings. However, traditional multi-modal learning systems face significant challenges when dealing with low-quality medical data, which is common due to factors such as inconsistent data collection across multiple sites and varying sensor resolutions, as well as information loss due to poor data management. To address these issues, in this paper, we identify and explore three core technical challenges surrounding multi-modal learning on low-quality medical data: (i) the absence of informative modalities, (ii) imbalanced clinically useful information across modalities, and (iii) the entanglement of valuable information with noise in the data. To fully harness the potential of multi-modal low-quality data for automated high-precision disease diagnosis, we propose a general medical multi-modality learning framework that addresses these three core challenges on varying medical scenarios involving multiple modalities. To compensate for the absence of informative modalities, we utilize existing modalities to selectively integrate valuable information and then perform imputation, which is effective even in extreme absence scenarios. For the issue of modality information imbalance, we explicitly quantify the relationships between different modalities for individual samples, ensuring that the effective information from advantageous modalities is fully utilized. Moreover, to mitigate the conflation of information with noise, our framework traceably identifies and activates lazy modality combinations to eliminate noise and enhance data quality. Extensive experiments demonstrate the superiority and broad applicability of our framework. In predicting in-hospital mortality using joint EHR, Chest X-ray, and Report dara, our framework surpasses existing methods, improving the AUROC from 0.811 to 0.872. When applied to lung cancer pathological subtyping using PET, CT, and Report data, our approach achieves an impressive AUROC of 0.894.
与人类医学专家的实践相呼应,多种医学检查方式的整合提高了临床环境中预测模型的性能。然而,传统的多模式学习系统在处理低质量医疗数据时面临重大挑战,这是由于多个站点的数据收集不一致和传感器分辨率不同以及由于数据管理不善导致的信息丢失等因素造成的。为了解决这些问题,在本文中,我们确定并探讨了围绕低质量医疗数据的多模式学习的三个核心技术挑战:(i)缺乏信息模式,(ii)跨模式的临床有用信息不平衡,以及(iii)数据中有价值信息与噪声的纠缠。为了充分利用多模态低质量数据在自动化高精度疾病诊断中的潜力,我们提出了一个通用的医学多模态学习框架,以解决涉及多模态的不同医疗场景中的这三个核心挑战。为了弥补信息模式的缺失,我们利用现有模式有选择地整合有价值的信息,然后进行imputation,即使在极端缺乏的情况下也是有效的。对于模态信息不平衡问题,我们明确量化了个体样本不同模态之间的关系,确保优势模态的有效信息得到充分利用。此外,为了减轻信息与噪声的混淆,我们的框架可追踪地识别和激活惰性模态组合,以消除噪声并提高数据质量。大量的实验证明了该框架的优越性和广泛的适用性。在使用联合电子病历、胸部x线和报告数据预测住院死亡率方面,我们的框架超越了现有方法,将AUROC从0.811提高到0.872。当使用PET, CT和报告数据应用于肺癌病理亚型时,我们的方法达到了令人印象深刻的AUROC为0.894。
{"title":"OmniFuse: A general modality fusion framework for multi-modality learning on low-quality medical data","authors":"Yixuan Wu, Jintai Chen, Lianting Hu, Hongxia Xu, Huiying Liang, Jian Wu","doi":"10.1016/j.inffus.2024.102890","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102890","url":null,"abstract":"Mirroring the practice of human medical experts, the integration of diverse medical examination modalities enhances the performance of predictive models in clinical settings. However, traditional multi-modal learning systems face significant challenges when dealing with low-quality medical data, which is common due to factors such as inconsistent data collection across multiple sites and varying sensor resolutions, as well as information loss due to poor data management. To address these issues, in this paper, we identify and explore three core technical challenges surrounding multi-modal learning on low-quality medical data: (i) the absence of informative modalities, (ii) imbalanced clinically useful information across modalities, and (iii) the entanglement of valuable information with noise in the data. To fully harness the potential of multi-modal low-quality data for automated high-precision disease diagnosis, we propose a general medical multi-modality learning framework that addresses these three core challenges on varying medical scenarios involving multiple modalities. To compensate for the absence of informative modalities, we utilize existing modalities to selectively integrate valuable information and then perform imputation, which is effective even in extreme absence scenarios. For the issue of modality information imbalance, we explicitly quantify the relationships between different modalities for individual samples, ensuring that the effective information from advantageous modalities is fully utilized. Moreover, to mitigate the conflation of information with noise, our framework traceably identifies and activates lazy modality combinations to eliminate noise and enhance data quality. Extensive experiments demonstrate the superiority and broad applicability of our framework. In predicting in-hospital mortality using joint EHR, Chest X-ray, and Report dara, our framework surpasses existing methods, improving the AUROC from 0.811 to 0.872. When applied to lung cancer pathological subtyping using PET, CT, and Report data, our approach achieves an impressive AUROC of 0.894.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"6 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth cue fusion for event-based stereo depth estimation 基于事件立体深度估计的深度线索融合
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-24 DOI: 10.1016/j.inffus.2024.102891
Dipon Kumar Ghosh, Yong Ju Jung
Inspired by the biological retina, event cameras utilize dynamic vision sensors to capture pixel intensity changes asynchronously. Event cameras offer numerous advantages, such as high dynamic range, high temporal resolution, less motion blur, and low power consumption. These features make event cameras particularly well-suited for depth estimation, especially in challenging scenarios involving rapid motion and high dynamic range imaging conditions. The human visual system perceives the scene depth by combining multiple depth cues such as monocular pictorial depth, stereo depth, and motion parallax. However, most existing algorithms of the event-based depth estimation utilize only single depth cue such as either stereo depth or monocular depth. While it is feasible to estimate depth from a single cue, estimating dense disparity in challenging scenarios and lightning conditions remains a challenging problem. Following this, we conduct extensive experiments to explore various methods for the depth cue fusion. Inspired by the experiment results, in this study, we propose a fusion architecture that systematically incorporates multiple depth cues for the event-based stereo depth estimation. To this end, we propose a depth cue fusion (DCF) network to fuse multiple depth cues by utilizing a novel fusion method called SpadeFormer. The proposed SpadeFormer is a full y context-aware fusion mechanism, which incorporates two modulation techniques (i.e., spatially adaptive denormalization (Spade) and cross-attention) for the depth cue fusion in a transformer block. The adaptive denormalization modulates both input features by adjusting the global statistics of features in a cross manner, and the modulated features are further fused by the cross-attention technique. Experiments conducted on a real-world dataset show that our method reduces the one-pixel error rate by at least 47.63% (3.708 for the best existing method vs. 1.942 for ours) and the mean absolute error by 40.07% (0.302 for the best existing method vs. 0.181 for ours). The results reveal that the depth cue fusion method outperforms the state-of-the-art methods by significant margins and produces better disparity maps.
受生物视网膜的启发,事件相机利用动态视觉传感器来捕捉像素强度的异步变化。事件相机具有许多优点,如高动态范围、高时间分辨率、较少运动模糊和低功耗。这些功能使得事件相机特别适合深度估计,特别是在涉及快速运动和高动态范围成像条件的具有挑战性的场景中。人类视觉系统通过结合多个深度线索,如单目图像深度、立体深度和运动视差,来感知场景深度。然而,现有的基于事件的深度估计算法大多只利用单一的深度线索,如立体深度或单目深度。虽然从单个线索估计深度是可行的,但在具有挑战性的场景和闪电条件下估计密度差仍然是一个具有挑战性的问题。在此基础上,我们进行了大量的实验来探索深度线索融合的各种方法。在实验结果的启发下,本研究提出了一种系统地融合多个深度线索的基于事件的立体深度估计融合架构。为此,我们提出了一个深度线索融合(DCF)网络,利用一种名为SpadeFormer的新型融合方法融合多个深度线索。提出的SpadeFormer是一种完全上下文感知的融合机制,它结合了两种调制技术(即空间自适应反规范化(Spade)和交叉注意),用于变压器块中的深度线索融合。自适应反规格化通过交叉调整特征的全局统计量来调制两个输入特征,并通过交叉注意技术进一步融合调制后的特征。在真实数据集上进行的实验表明,我们的方法将单像素错误率降低了至少47.63%(最佳现有方法为3.708,我们的方法为1.942),平均绝对误差降低了40.07%(最佳现有方法为0.302,我们的方法为0.181)。结果表明,深度线索融合方法明显优于最先进的方法,并产生更好的视差图。
{"title":"Depth cue fusion for event-based stereo depth estimation","authors":"Dipon Kumar Ghosh, Yong Ju Jung","doi":"10.1016/j.inffus.2024.102891","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102891","url":null,"abstract":"Inspired by the biological retina, event cameras utilize dynamic vision sensors to capture pixel intensity changes asynchronously. Event cameras offer numerous advantages, such as high dynamic range, high temporal resolution, less motion blur, and low power consumption. These features make event cameras particularly well-suited for depth estimation, especially in challenging scenarios involving rapid motion and high dynamic range imaging conditions. The human visual system perceives the scene depth by combining multiple depth cues such as monocular pictorial depth, stereo depth, and motion parallax. However, most existing algorithms of the event-based depth estimation utilize only single depth cue such as either stereo depth or monocular depth. While it is feasible to estimate depth from a single cue, estimating dense disparity in challenging scenarios and lightning conditions remains a challenging problem. Following this, we conduct extensive experiments to explore various methods for the depth cue fusion. Inspired by the experiment results, in this study, we propose a fusion architecture that systematically incorporates multiple depth cues for the event-based stereo depth estimation. To this end, we propose a depth cue fusion (DCF) network to fuse multiple depth cues by utilizing a novel fusion method called SpadeFormer. The proposed SpadeFormer is a full y context-aware fusion mechanism, which incorporates two modulation techniques (i.e., spatially adaptive denormalization (Spade) and cross-attention) for the depth cue fusion in a transformer block. The adaptive denormalization modulates both input features by adjusting the global statistics of features in a cross manner, and the modulated features are further fused by the cross-attention technique. Experiments conducted on a real-world dataset show that our method reduces the one-pixel error rate by at least 47.63% (3.708 for the best existing method vs. 1.942 for ours) and the mean absolute error by 40.07% (0.302 for the best existing method vs. 0.181 for ours). The results reveal that the depth cue fusion method outperforms the state-of-the-art methods by significant margins and produces better disparity maps.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"44 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimum adjustment consensus model for multi-person multi-criteria large scale decision-making with trust consistency propagation and opinion dynamics 基于信任、一致性传播和意见动态的多人多准则大规模决策最小调整共识模型
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-23 DOI: 10.1016/j.inffus.2024.102883
Xi-Yu Wang, Ying-Ming Wang
The consensus reaching process (CRP) represents a multi-round dynamic method essential for harmonizing the interests of multiple parties. With the rise of instant messaging and social media, the complexity of individual social trust networks and structures. Therefore, it is crucial to explore the inherent value of trust networks in the context of multi-person multi-criteria large-scale decision-making (MpMcLSDM) to facilitate consensus. This paper develops a minimum adjustment consensus model (MACM) for MpMcLSDM based on social trust network analysis (STNA). First, the consistency path rule and personal traits are defined through STNA, leading to a formulated strategy for the completion of the trust relationship. Subsequently, a novel centrality measure, informed by the consistency path rule, is proposed, and a weight method is devised to determine decision-maker (DM) weights and sub-cluster weights after clustering. This paper further elucidates the implications of consensus level fluctuations on DM self-confidence and opinion inclination. Ultimately, a MACM is constructed within the MpMcLSDM framework, integrating opinion dynamics. A numerical example demonstrates the model’s effectiveness, and comparisons with other methods show its rationale and improvement in performance.
协商一致过程是协调多方利益的多轮动态方法。随着即时通讯和社交媒体的兴起,个人社会信任网络和结构的复杂性。因此,在多人多准则大规模决策(MpMcLSDM)背景下,探索信任网络的内在价值以促进共识是至关重要的。本文建立了基于社会信任网络分析(STNA)的MpMcLSDM最小调整共识模型(MACM)。首先,通过STNA定义一致性路径规则和个人特征,从而制定完成信任关系的策略。在此基础上,提出了一种基于一致性路径规则的中心性度量方法,并设计了一种确定聚类后决策者(DM)权值和子聚类权值的权重方法。本文进一步阐明共识水平波动对决策决策者自信和意见倾向的影响。最后,在MpMcLSDM框架内构建MACM,整合意见动态。数值算例验证了该模型的有效性,并与其他方法进行了比较,证明了该模型的合理性和性能的改进。
{"title":"Minimum adjustment consensus model for multi-person multi-criteria large scale decision-making with trust consistency propagation and opinion dynamics","authors":"Xi-Yu Wang, Ying-Ming Wang","doi":"10.1016/j.inffus.2024.102883","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102883","url":null,"abstract":"The consensus reaching process (CRP) represents a multi-round dynamic method essential for harmonizing the interests of multiple parties. With the rise of instant messaging and social media, the complexity of individual social trust networks and structures. Therefore, it is crucial to explore the inherent value of trust networks in the context of multi-person multi-criteria large-scale decision-making (MpMcLSDM) to facilitate consensus. This paper develops a minimum adjustment consensus model (MACM) for MpMcLSDM based on social trust network analysis (STNA). First, the consistency path rule and personal traits are defined through STNA, leading to a formulated strategy for the completion of the trust relationship. Subsequently, a novel centrality measure, informed by the consistency path rule, is proposed, and a weight method is devised to determine decision-maker (DM) weights and sub-cluster weights after clustering. This paper further elucidates the implications of consensus level fluctuations on DM self-confidence and opinion inclination. Ultimately, a MACM is constructed within the MpMcLSDM framework, integrating opinion dynamics. A numerical example demonstrates the model’s effectiveness, and comparisons with other methods show its rationale and improvement in performance.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"33 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142902102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive survey of large language models and multimodal large language models in medicine 医学大语言模型和多模态大语言模型的综合研究
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-23 DOI: 10.1016/j.inffus.2024.102888
Hanguang Xiao, Feizhong Zhou, Xingyue Liu, Tianqi Liu, Zhipeng Li, Xin Liu, Xiaoxuan Huang
Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal large language models (MLLMs) have attracted widespread attention for their exceptional capabilities in understanding, reasoning, and generation, introducing transformative paradigms for integrating artificial intelligence into medicine. This survey provides a comprehensive overview of the development, principles, application scenarios, challenges, and future directions of LLMs and MLLMs in medicine. Specifically, it begins by examining the paradigm shift, tracing the transition from traditional models to LLMs and MLLMs, and highlighting the unique advantages of these LLMs and MLLMs in medical applications. Next, the survey reviews existing medical LLMs and MLLMs, providing detailed guidance on their construction and evaluation in a clear and systematic manner. Subsequently, to underscore the substantial value of LLMs and MLLMs in healthcare, the survey explores five promising applications in the field. Finally, the survey addresses the challenges confronting medical LLMs and MLLMs and proposes practical strategies and future directions for their integration into medicine. In summary, this survey offers a comprehensive analysis of the technical methodologies and practical clinical applications of medical LLMs and MLLMs, with the goal of bridging the gap between these advanced technologies and clinical practice, thereby fostering the evolution of the next generation of intelligent healthcare systems.
自ChatGPT和GPT-4发布以来,大型语言模型(llm)和多模态大型语言模型(mllm)因其在理解、推理和生成方面的卓越能力而受到广泛关注,为将人工智能集成到医学中引入了变革范例。本调查全面概述了医学法学硕士和法学硕士的发展、原理、应用场景、挑战和未来方向。具体来说,它首先检查范式转变,追踪从传统模型到llm和mllm的转变,并强调这些llm和mllm在医学应用中的独特优势。其次,调查回顾了现有的医学法学硕士和mllm,为其建设和评估提供了清晰、系统的详细指导。随后,为了强调llm和mllm在医疗保健领域的巨大价值,调查探讨了该领域的五个有前途的应用。最后,调查解决了医学法学硕士和法学硕士面临的挑战,并提出了实用的策略和未来的方向,使其融入医学。总之,这项调查提供了医学法学硕士和法学硕士的技术方法和实际临床应用的全面分析,旨在弥合这些先进技术与临床实践之间的差距,从而促进下一代智能医疗保健系统的发展。
{"title":"A comprehensive survey of large language models and multimodal large language models in medicine","authors":"Hanguang Xiao, Feizhong Zhou, Xingyue Liu, Tianqi Liu, Zhipeng Li, Xin Liu, Xiaoxuan Huang","doi":"10.1016/j.inffus.2024.102888","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102888","url":null,"abstract":"Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal large language models (MLLMs) have attracted widespread attention for their exceptional capabilities in understanding, reasoning, and generation, introducing transformative paradigms for integrating artificial intelligence into medicine. This survey provides a comprehensive overview of the development, principles, application scenarios, challenges, and future directions of LLMs and MLLMs in medicine. Specifically, it begins by examining the paradigm shift, tracing the transition from traditional models to LLMs and MLLMs, and highlighting the unique advantages of these LLMs and MLLMs in medical applications. Next, the survey reviews existing medical LLMs and MLLMs, providing detailed guidance on their construction and evaluation in a clear and systematic manner. Subsequently, to underscore the substantial value of LLMs and MLLMs in healthcare, the survey explores five promising applications in the field. Finally, the survey addresses the challenges confronting medical LLMs and MLLMs and proposes practical strategies and future directions for their integration into medicine. In summary, this survey offers a comprehensive analysis of the technical methodologies and practical clinical applications of medical LLMs and MLLMs, with the goal of bridging the gap between these advanced technologies and clinical practice, thereby fostering the evolution of the next generation of intelligent healthcare systems.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"32 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1