Information Fusion最新文献_第4页

CDF-DSR: Learning continuous depth field for self-supervised RGB-guided depth map super resolution CDF-DSR：学习连续深度场的自监督rgb引导深度图超分辨率

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-19 DOI: 10.1016/j.inffus.2024.102884

Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou

RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: https://github.com/zsy950116/CDF-DSR.

RGB引导深度图超分辨率（GDSR）是一项关键的多模态融合任务，旨在利用相应的高分辨率（HR） RGB图像作为引导增强低分辨率（LR）深度图。现有的方法主要依赖于有监督的深度学习技术，由于收集各种RGB-D数据集的挑战，这些技术往往受到泛化能力有限的阻碍。为了解决这个问题，我们引入了一种新的自监督范式，该范式仅利用单个RGB-D样本实现深度图超分辨率，而无需任何额外的训练数据。考虑到场景深度通常是连续的，本文提出的方法将GDSR任务定义为为每个RGB-D样本重建一个连续的深度场。深度场表示为基于神经网络的从图像坐标到深度值的映射，并通过利用可用的HR RGB图像和LR深度图进行优化。同时，提出了一种新的跨模态几何一致性损失来提高深度场的细节精度。跨多个数据集的实验结果表明，与目前最先进的GDSR方法相比，该方法具有更好的泛化能力，在实际应用中表现出显著的性能。测试代码可从https://github.com/zsy950116/CDF-DSR获得。

{"title":"CDF-DSR: Learning continuous depth field for self-supervised RGB-guided depth map super resolution","authors":"Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou","doi":"10.1016/j.inffus.2024.102884","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102884","url":null,"abstract":"RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: <ce:inter-ref xlink:href=\"https://github.com/zsy950116/CDF-DSR\" xlink:type=\"simple\">https://github.com/zsy950116/CDF-DSR</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"359 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ADF-OCT: An advanced Assistive Diagnosis Framework for study-level macular optical coherence tomography ADF-OCT：研究级黄斑光学相干断层扫描的先进辅助诊断框架

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-18 DOI: 10.1016/j.inffus.2024.102877

Weihao Gao, Wangting Li, Dong Fang, Zheng Gong, Chucheng Chen, Zhuo Deng, Fuju Rong, Lu Chen, Lujia Feng, Canfeng Huang, Jia Liang, Yijing Zhuang, Pengxue Wei, Ting Xie, Zhiyuan Niu, Fang Li, Xianling Tang, Bing Zhang, Zixia Zhou, Shaochong Zhang, Lan Ma

Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.

光学相干断层扫描（OCT）是一种先进的视网膜成像技术，可以实现视网膜的无创横切面可视化，在眼科中检测各种黄斑病变起着至关重要的作用。虽然深度学习在OCT图像分析中显示出前景，但现有的研究主要集中在广泛的图像级疾病诊断上。本研究引入了OCT辅助诊断框架（ADF-OCT），该框架利用100多万黄斑OCT图像数据集构建了常见黄斑病变的多标签诊断模型和医疗报告生成模块。我们创新的多帧医学图像蒸馏方法有效地将研究级多标签注释转换为图像级注释，从而提高诊断性能，而无需额外的注释信息。该方法显著提高了多标签分类的诊断准确率，AUROC达到了令人印象深刻的0.9891，最佳性能宏F1为0.8533，准确率为0.9411。通过改进多帧医学成像中的特征融合策略，我们的框架大大增强了OCT b扫描医学报告的生成，超越了当前的解决方案。本研究提出了一个先进的开发管道，利用现有的临床数据集，为黄斑OCT提供更准确和全面的人工智能辅助诊断。

{"title":"ADF-OCT: An advanced Assistive Diagnosis Framework for study-level macular optical coherence tomography","authors":"Weihao Gao, Wangting Li, Dong Fang, Zheng Gong, Chucheng Chen, Zhuo Deng, Fuju Rong, Lu Chen, Lujia Feng, Canfeng Huang, Jia Liang, Yijing Zhuang, Pengxue Wei, Ting Xie, Zhiyuan Niu, Fang Li, Xianling Tang, Bing Zhang, Zixia Zhou, Shaochong Zhang, Lan Ma","doi":"10.1016/j.inffus.2024.102877","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102877","url":null,"abstract":"Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"11 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diff-PC: Identity-preserving and 3D-aware controllable diffusion for zero-shot portrait customization Diff-PC：身份保持和3d感知可控扩散零拍摄肖像定制

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-12 DOI: 10.1016/j.inffus.2024.102869

Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du

Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose Diff-PC, a diffusion-based framework for zero-shot PC, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.

肖像定制（PC）因其潜在的应用前景而受到广泛关注。然而，现有的 PC 方法缺乏精确的身份（ID）保存和人脸控制。为了解决这些问题，我们提出了 Diff-PC，这是一种基于扩散的零镜头 PC 框架，可生成具有高 ID 保真度、指定面部属性和多样化背景的逼真肖像。具体来说，我们的方法采用三维人脸预测器来重建三维感知的面部先验，其中包括参考 ID、目标表情和姿势。为了捕捉细粒度的面部细节，我们设计了融合局部和全局面部特征的 ID 编码器。随后，我们设计了 ID-Ctrl，利用三维人脸来指导 ID 特征的对齐。我们进一步引入了 ID 注入器，以增强 ID 的保真度和面部可控性。最后，在我们收集的以 ID 为中心的数据集上进行训练，提高了人脸相似度和文本到图像（T2I）的对齐度。广泛的实验证明，Diff-PC 在 ID 保存、面部控制和 T2I 一致性方面超越了最先进的方法。值得注意的是，在所有数据集上，人脸相似度都提高了约 +3%。此外，我们的方法与多风格基础模型兼容。

{"title":"Diff-PC: Identity-preserving and 3D-aware controllable diffusion for zero-shot portrait customization","authors":"Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du","doi":"10.1016/j.inffus.2024.102869","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102869","url":null,"abstract":"Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose <ce:bold>Diff-PC</ce:bold>, a <ce:bold>diff</ce:bold>usion-based framework for zero-shot <ce:bold>PC</ce:bold>, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"22 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

[formula omitted]-MGSVM: Controllable multi-granularity support vector algorithm for classification and regression [公式略]-MGSVM：用于分类和回归的可控多粒度支持向量算法

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-12 DOI: 10.1016/j.inffus.2024.102867

Yabin Shao, Youlin Hua, Zengtai Gong, Xueqin Zhu, Yunlong Cheng, Laquan Li, Shuyin Xia

The ν support vector machine (ν-SVM) is an enhanced algorithm derived from support vector machines using parameter ν to replace the original penalty coefficient C. Because of the narrower range of ν compared with the infinite range of C, ν-SVM generally outperforms the standard SVM. Granular ball computing is an information fusion method that enhances system robustness and reduces uncertainty. To further improve the efficiency and robustness of support vector algorithms, this paper introduces the concept of multigranularity granular balls and proposes the controllable multigranularity SVM (Con-MGSVM) and the controllable multigranularity support vector regression machine (Con-MGSVR). These models use granular computing theory, replacing original fine-grained points with coarse-grained “granular balls” as inputs to a classifier or regressor. By introducing control parameter ν, the number of support granular balls can be further reduced, thereby enhancing computational efficiency and improving robustness and interpretability. Furthermore, this paper derives and solves the dual models of Con-MGSVM and Con-MGSVR and conducts a comparative study on the relationship between the granular ball SVM (GBSVM) and the Con-MGSVM model, elucidating the importance of control parameters. Experimental results demonstrate that Con-MGSVM and Con-MGSVR not only improve accuracy and fitting performance but also effectively reduce the number of support granular balls.

ν支持向量机（ν-SVM）是一种从支持向量机衍生出来的增强算法，它使用参数ν来替代原来的惩罚系数C。与C的无限范围相比，ν的范围较窄，因此ν-SVM的性能通常优于标准SVM。粒度球计算是一种信息融合方法，它能增强系统的鲁棒性并减少不确定性。为了进一步提高支持向量算法的效率和鲁棒性，本文引入了多粒度粒度球的概念，并提出了可控多粒度 SVM（Con-MGSVM）和可控多粒度支持向量回归机（Con-MGSVR）。这些模型采用粒度计算理论，用粗粒度的 "粒度球 "取代原始细粒度点作为分类器或回归器的输入。通过引入控制参数ν，可以进一步减少支持粒度球的数量，从而提高计算效率，改善鲁棒性和可解释性。此外，本文还推导并求解了 Con-MGSVM 和 Con-MGSVR 的对偶模型，并对粒球 SVM（GBSVM）与 Con-MGSVM 模型之间的关系进行了比较研究，阐明了控制参数的重要性。实验结果表明，Con-MGSVM 和 Con-MGSVR 不仅提高了精度和拟合性能，还有效减少了支持颗粒球的数量。

{"title":"[formula omitted]-MGSVM: Controllable multi-granularity support vector algorithm for classification and regression","authors":"Yabin Shao, Youlin Hua, Zengtai Gong, Xueqin Zhu, Yunlong Cheng, Laquan Li, Shuyin Xia","doi":"10.1016/j.inffus.2024.102867","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102867","url":null,"abstract":"The <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math> support vector machine (<mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math>-SVM) is an enhanced algorithm derived from support vector machines using parameter <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math> to replace the original penalty coefficient <mml:math altimg=\"si4.svg\" display=\"inline\"><mml:mi>C</mml:mi></mml:math>. Because of the narrower range of <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math> compared with the infinite range of <mml:math altimg=\"si4.svg\" display=\"inline\"><mml:mi>C</mml:mi></mml:math>, <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math>-SVM generally outperforms the standard SVM. Granular ball computing is an information fusion method that enhances system robustness and reduces uncertainty. To further improve the efficiency and robustness of support vector algorithms, this paper introduces the concept of multigranularity granular balls and proposes the controllable multigranularity SVM (<mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM) and the controllable multigranularity support vector regression machine (<mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVR). These models use granular computing theory, replacing original fine-grained points with coarse-grained “granular balls” as inputs to a classifier or regressor. By introducing control parameter <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math>, the number of support granular balls can be further reduced, thereby enhancing computational efficiency and improving robustness and interpretability. Furthermore, this paper derives and solves the dual models of <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM and <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVR and conducts a comparative study on the relationship between the granular ball SVM (GBSVM) and the <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM model, elucidating the importance of control parameters. Experimental results demonstrate that <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM and <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVR not only improve accuracy and fitting performance but also effectively reduce the number of support granular balls.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"43 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancements in perception system with multi-sensor fusion for embodied agents 具身智能体多传感器融合感知系统研究进展

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-11 DOI: 10.1016/j.inffus.2024.102859

Hao Du, Lu Ren, Yuanda Wang, Xiang Cao, Changyin Sun

The multi-sensor data fusion perception technology, as a pivotal technique for achieving complex environmental perception and decision-making, has been garnering extensive attention from researchers. To date, there has been a lack of comprehensive review articles discussing the research progress of multi-sensor fusion perception systems for embodied agents, particularly in terms of analyzing the agent’s perception of itself and the surrounding scene. To address this gap and encourage further research, this study defines key terminology and analyzes datasets from the past two decades, focusing on advancements in multi-sensor fusion SLAM and multi-sensor scene perception. These key designs can aid researchers in gaining a better understanding of the field and initiating research in the domain of multi-sensor fusion perception for embodied agents. In this survey, we begin with a brief introduction to common sensor types and their characteristics. We then delve into the multi-sensor fusion perception datasets tailored for the domains of autonomous driving, drones, unmanned ground vehicles, and unmanned surface vehicles. Following this, we discuss the classification and fundamental principles of existing multi-sensor data fusion SLAM algorithms, and present the experimental outcomes of various classical fusion frameworks. Subsequently, we comprehensively review the technologies of multi-sensor data fusion scene perception, including object detection, semantic segmentation, instance segmentation, and panoramic understanding. Finally, we summarize our findings and discuss potential future developments in multi-sensor fusion perception technology.

多传感器数据融合感知技术作为实现复杂环境感知和决策的关键技术，一直受到研究人员的广泛关注。迄今为止，还缺乏全面的综述文章来讨论面向具身代理的多传感器融合感知系统的研究进展，尤其是在分析代理对自身和周围场景的感知方面。为了填补这一空白并鼓励进一步研究，本研究定义了关键术语，并分析了过去二十年的数据集，重点关注多传感器融合 SLAM 和多传感器场景感知方面的进展。这些关键设计可帮助研究人员更好地了解该领域，并启动面向具身代理的多传感器融合感知领域的研究。在这份调查报告中，我们首先简要介绍了常见的传感器类型及其特点。然后，我们深入探讨了为自动驾驶、无人机、无人地面车辆和无人水面车辆等领域量身定制的多传感器融合感知数据集。随后，我们讨论了现有多传感器数据融合 SLAM 算法的分类和基本原理，并介绍了各种经典融合框架的实验结果。随后，我们全面回顾了多传感器数据融合场景感知技术，包括物体检测、语义分割、实例分割和全景理解。最后，我们总结了我们的研究成果，并讨论了多传感器融合感知技术的未来发展潜力。

{"title":"Advancements in perception system with multi-sensor fusion for embodied agents","authors":"Hao Du, Lu Ren, Yuanda Wang, Xiang Cao, Changyin Sun","doi":"10.1016/j.inffus.2024.102859","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102859","url":null,"abstract":"The multi-sensor data fusion perception technology, as a pivotal technique for achieving complex environmental perception and decision-making, has been garnering extensive attention from researchers. To date, there has been a lack of comprehensive review articles discussing the research progress of multi-sensor fusion perception systems for embodied agents, particularly in terms of analyzing the agent’s perception of itself and the surrounding scene. To address this gap and encourage further research, this study defines key terminology and analyzes datasets from the past two decades, focusing on advancements in multi-sensor fusion SLAM and multi-sensor scene perception. These key designs can aid researchers in gaining a better understanding of the field and initiating research in the domain of multi-sensor fusion perception for embodied agents. In this survey, we begin with a brief introduction to common sensor types and their characteristics. We then delve into the multi-sensor fusion perception datasets tailored for the domains of autonomous driving, drones, unmanned ground vehicles, and unmanned surface vehicles. Following this, we discuss the classification and fundamental principles of existing multi-sensor data fusion SLAM algorithms, and present the experimental outcomes of various classical fusion frameworks. Subsequently, we comprehensively review the technologies of multi-sensor data fusion scene perception, including object detection, semantic segmentation, instance segmentation, and panoramic understanding. Finally, we summarize our findings and discuss potential future developments in multi-sensor fusion perception technology.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"47 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Insight at the right spot: Provide decisive subgraph information to Graph LLM with reinforcement learning Insight at right spot：通过强化学习为Graph LLM提供决定性的子图信息

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-11 DOI: 10.1016/j.inffus.2024.102860

Tiesunlong Shen, Erik Cambria, Jin Wang, Yi Cai, Xuejie Zhang

Large language models (LLMs) cannot see or understand graphs. The current Graph LLM method transform graph structures into a format LLMs understands, utilizing LLM as a predictor to perform graph-learning task. However, these approaches have underperformed in graph-learning tasks. The issues arise because these methods typically rely on a fixed neighbor hop count for the target node set by expert experience, limiting the LLM’s access to only a certain range of neighbor information. Due to the black-box nature of LLM, it is challenging to determine which specific pieces of neighborhood information can effectively assist LLMs in making accurate inferences, which prevents LLMs from generating correct inferences. This study proposes to assist LLM in gaining insight at the right spot by providing decisive subgraph information to Graph LLM with reinforcement learning (Spider). A reinforcement subgraph detection module was designed to search for essential neighborhoods that influence LLM’s predictions. A decisive node-guided network was then applied to guide the reinforcement subgraph network, allowing LLMs to rely more on crucial nodes within the essential neighborhood for predictions. Essential neighborhood and decisive node information are provided to LLM in text form without the requirement of retraining. Experiments on five graph learning datasets demonstrate the effectiveness of the proposed model against all baselines, including GNN and LLM methods.

大型语言模型（llm）无法看到或理解图形。当前的图LLM方法将图结构转换成LLM可以理解的格式，利用LLM作为预测器来执行图学习任务。然而，这些方法在图学习任务中表现不佳。出现问题的原因是，这些方法通常依赖于专家经验为目标节点设置的固定邻居跳数，限制了LLM只能访问特定范围的邻居信息。由于LLM的黑盒性质，很难确定哪些特定的邻域信息可以有效地帮助LLM进行准确的推理，这阻碍了LLM生成正确的推理。本研究提出通过强化学习（Spider）向Graph LLM提供决定性的子图信息，以帮助LLM在正确的位置获得洞察力。设计了一个增强子图检测模块来搜索影响LLM预测的基本邻域。然后应用决定性节点引导网络来引导强化子图网络，允许llm更多地依赖基本邻域内的关键节点进行预测。基本邻域和决定性节点信息以文本形式提供给LLM，无需再训练。在5个图学习数据集上的实验证明了该模型对所有基线（包括GNN和LLM方法）的有效性。

{"title":"Insight at the right spot: Provide decisive subgraph information to Graph LLM with reinforcement learning","authors":"Tiesunlong Shen, Erik Cambria, Jin Wang, Yi Cai, Xuejie Zhang","doi":"10.1016/j.inffus.2024.102860","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102860","url":null,"abstract":"Large language models (LLMs) cannot see or understand graphs. The current Graph LLM method transform graph structures into a format LLMs understands, utilizing LLM as a predictor to perform graph-learning task. However, these approaches have underperformed in graph-learning tasks. The issues arise because these methods typically rely on a fixed neighbor hop count for the target node set by expert experience, limiting the LLM’s access to only a certain range of neighbor information. Due to the black-box nature of LLM, it is challenging to determine which specific pieces of neighborhood information can effectively assist LLMs in making accurate inferences, which prevents LLMs from generating correct inferences. This study proposes to assist LLM in gaining insight at the right <ce:bold><ce:italic>s</ce:italic></ce:bold>pot by <ce:bold><ce:italic>p</ce:italic></ce:bold>rov<ce:bold><ce:italic>i</ce:italic></ce:bold>ding <ce:bold><ce:italic>de</ce:italic></ce:bold>cisive subgraph information to Graph LLM with <ce:bold><ce:italic>r</ce:italic></ce:bold>einforcement learning (<ce:bold><ce:italic>Spider</ce:italic></ce:bold>). A reinforcement subgraph detection module was designed to search for essential neighborhoods that influence LLM’s predictions. A decisive node-guided network was then applied to guide the reinforcement subgraph network, allowing LLMs to rely more on crucial nodes within the essential neighborhood for predictions. Essential neighborhood and decisive node information are provided to LLM in text form without the requirement of retraining. Experiments on five graph learning datasets demonstrate the effectiveness of the proposed model against all baselines, including GNN and LLM methods.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"50 5 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel hybrid model combining Vision Transformers and Graph Convolutional Networks for monkeypox disease effective diagnosis 一种结合视觉变换和图卷积网络的猴痘病有效诊断混合模型

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-10 DOI: 10.1016/j.inffus.2024.102858

Bihter Das, Huseyin Alperen Dagdogen, Muhammed Onur Kaya, Resul Das

Accurate diagnosis of monkeypox is challenging due to the limitations of current diagnostic techniques, which struggle to account for skin lesions’ complex visual and structural characteristics. This study aims to develop a novel hybrid model that combines the strengths of Vision Transformers (ViT), ResNet50, and AlexNet with Graph Convolutional Networks (GCN) to improve monkeypox diagnostic accuracy. Our method captures both the visual features and structural relationships within skin lesions, offering a more comprehensive approach to classification. Rigorous testing on two distinct datasets demonstrated that the ViT+GCN model achieved superior accuracy, particularly excelling in binary classification with 100% accuracy and multi-class classification with a 97% accuracy rate. These findings indicate that integrating visual and structural information enhances diagnostic reliability. While promising, this model requires further development, including larger datasets and optimization for real-time applications. Overall, this approach advances dermatological diagnostics and holds potential for broader applications in diagnosing other skin-related diseases.

由于当前诊断技术的局限性，准确诊断猴痘具有挑战性，这些诊断技术难以解释皮肤病变复杂的视觉和结构特征。本研究旨在开发一种新的混合模型，将视觉变压器（ViT）、ResNet50和AlexNet的优势与图卷积网络（GCN）相结合，以提高猴痘诊断的准确性。我们的方法捕获了皮肤病变的视觉特征和结构关系，提供了更全面的分类方法。在两个不同的数据集上进行的严格测试表明，ViT+GCN模型取得了优异的准确率，特别是在二元分类准确率为100%，多类分类准确率为97%。这些发现表明，整合视觉和结构信息可以提高诊断的可靠性。虽然前景很好，但该模型需要进一步开发，包括更大的数据集和实时应用的优化。总的来说，这种方法推进了皮肤科诊断，并在诊断其他皮肤相关疾病方面具有更广泛的应用潜力。

{"title":"A novel hybrid model combining Vision Transformers and Graph Convolutional Networks for monkeypox disease effective diagnosis","authors":"Bihter Das, Huseyin Alperen Dagdogen, Muhammed Onur Kaya, Resul Das","doi":"10.1016/j.inffus.2024.102858","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102858","url":null,"abstract":"Accurate diagnosis of monkeypox is challenging due to the limitations of current diagnostic techniques, which struggle to account for skin lesions’ complex visual and structural characteristics. This study aims to develop a novel hybrid model that combines the strengths of Vision Transformers (ViT), ResNet50, and AlexNet with Graph Convolutional Networks (GCN) to improve monkeypox diagnostic accuracy. Our method captures both the visual features and structural relationships within skin lesions, offering a more comprehensive approach to classification. Rigorous testing on two distinct datasets demonstrated that the ViT+GCN model achieved superior accuracy, particularly excelling in binary classification with 100% accuracy and multi-class classification with a 97% accuracy rate. These findings indicate that integrating visual and structural information enhances diagnostic reliability. While promising, this model requires further development, including larger datasets and optimization for real-time applications. Overall, this approach advances dermatological diagnostics and holds potential for broader applications in diagnosing other skin-related diseases.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"3 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient self-supervised heterogeneous graph representation learning with reconstruction 带重构的高效自监督异构图表示学习

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-10 DOI: 10.1016/j.inffus.2024.102846

Yujie Mo, Heng Tao Shen, Xiaofeng Zhu

Heterogeneous graph representation learning (HGRL), as one of powerful techniques to process the heterogeneous graph data, has shown superior performance and attracted increasing attention. However, existing HGRL methods still face issues to be addressed: (i) They capture the consistency among different meta-path-based views to induce expensive computation costs and possibly cause dimension collapse. (ii) They ignore the complementarity within each meta-path-based view to degrade the model’s effectiveness. To alleviate these issues, in this paper, we propose a new self-supervised HGRL framework to capture the consistency among different views, maintain the complementarity within each view, and avoid dimension collapse. Specifically, the proposed method investigates the correlation loss to capture the consistency among different views and reduce the dimension redundancy, as well as investigates the reconstruction loss to maintain complementarity within each view to benefit downstream tasks. We further theoretically prove that the proposed method can effectively incorporate task-relevant information into node representations, thereby enhancing performance in downstream tasks. Extensive experiments on multiple public datasets validate the effectiveness and efficiency of the proposed method on downstream tasks.

异构图表示学习（HGRL）作为处理异构图数据的强大技术之一，已显示出卓越的性能，并吸引了越来越多的关注。然而，现有的异构图表示学习方法仍然面临着一些有待解决的问题：(i) 它们捕捉不同元路径视图之间的一致性，从而导致昂贵的计算成本，并可能造成维度崩溃。(ii) 它们忽略了每个元路径视图内部的互补性，从而降低了模型的有效性。为了缓解这些问题，我们在本文中提出了一种新的自监督 HGRL 框架，以捕捉不同视图之间的一致性，保持每个视图内部的互补性，并避免维度崩溃。具体来说，本文提出的方法研究了相关损失，以捕捉不同视图之间的一致性并减少维度冗余，同时还研究了重构损失，以保持每个视图内部的互补性，从而有利于下游任务。我们进一步从理论上证明，所提出的方法能有效地将任务相关信息纳入节点表示，从而提高下游任务的性能。在多个公共数据集上进行的广泛实验验证了所提方法在下游任务中的有效性和效率。

{"title":"Efficient self-supervised heterogeneous graph representation learning with reconstruction","authors":"Yujie Mo, Heng Tao Shen, Xiaofeng Zhu","doi":"10.1016/j.inffus.2024.102846","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102846","url":null,"abstract":"Heterogeneous graph representation learning (HGRL), as one of powerful techniques to process the heterogeneous graph data, has shown superior performance and attracted increasing attention. However, existing HGRL methods still face issues to be addressed: (i) They capture the consistency among different meta-path-based views to induce expensive computation costs and possibly cause dimension collapse. (ii) They ignore the complementarity within each meta-path-based view to degrade the model’s effectiveness. To alleviate these issues, in this paper, we propose a new self-supervised HGRL framework to capture the consistency among different views, maintain the complementarity within each view, and avoid dimension collapse. Specifically, the proposed method investigates the correlation loss to capture the consistency among different views and reduce the dimension redundancy, as well as investigates the reconstruction loss to maintain complementarity within each view to benefit downstream tasks. We further theoretically prove that the proposed method can effectively incorporate task-relevant information into node representations, thereby enhancing performance in downstream tasks. Extensive experiments on multiple public datasets validate the effectiveness and efficiency of the proposed method on downstream tasks.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"116 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PHIM-MIL: Multiple instance learning with prototype similarity-guided feature fusion and hard instance mining for whole slide image classification PHIM-MIL：基于原型相似度引导特征融合的多实例学习和硬实例挖掘的全幻灯片图像分类

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-10 DOI: 10.1016/j.inffus.2024.102847

Yining Xie, Zequn Liu, Jing Zhao, Jiayi Ma

The large size of whole slide images (WSIs) in pathology makes it difficult to obtain fine-grained annotations. Therefore, multi-instance learning (MIL) methods are typically utilized to classify histopathology WSIs. However, current models overly focus on local features of instances, neglecting connection between local features and global features. Additionally, they tend to recognize simple instances while struggling to distinguish hard instances. To address the above issues, we design a two-stage MIL model training approach (PHIM-MIL). In the first stage, a downstream aggregation model is pre-trained to equip it with the ability to recognize simple instances. In the second stage, we integrate global information and make the model focus on mining hard instances. First, the similarity between instances and prototypes is leveraged for weighted aggregation and hence obtaining semi-global features, which helps model understand the relationship between each instance and the global features. Then, instance features and semi-global features are fused to enhance instance feature information, bringing similar instances closer while alienating dissimilar ones. Finally, the hard instance mining strategy is employed to process the fused features, improving the pre-trained aggregation model’s capability to recognize and handle hard instances. Extensive experimental results on the GastricCancer and Camelyon16 datasets demonstrate that PHIM-MIL outperforms other latest state-of-the-art methods in terms of performance and computing cost. Meanwhile, PHIM-MIL continues to deliver consistent performance improvements when the feature extraction network is replaced.

病理学中的整张切片图像（WSI）尺寸较大，很难获得精细的注释。因此，通常采用多实例学习（MIL）方法对组织病理学 WSI 进行分类。然而，目前的模型过于关注实例的局部特征，忽视了局部特征与全局特征之间的联系。此外，它们往往只能识别简单的实例，而难以区分困难的实例。为解决上述问题，我们设计了一种两阶段 MIL 模型训练方法（PHIM-MIL）。在第一阶段，对下游聚合模型进行预训练，使其具备识别简单实例的能力。在第二阶段，我们整合全局信息，使模型专注于挖掘困难实例。首先，利用实例与原型之间的相似性进行加权聚合，从而获得半全局特征，这有助于模型理解每个实例与全局特征之间的关系。然后，融合实例特征和半全局特征以增强实例特征信息，拉近相似实例的距离，同时疏远不相似的实例。最后，采用硬实例挖掘策略来处理融合后的特征，从而提高预训练聚合模型识别和处理硬实例的能力。在 GastricCancer 和 Camelyon16 数据集上的大量实验结果表明，PHIM-MIL 在性能和计算成本方面都优于其他最新的先进方法。同时，当更换特征提取网络时，PHIM-MIL 仍能持续提高性能。

{"title":"PHIM-MIL: Multiple instance learning with prototype similarity-guided feature fusion and hard instance mining for whole slide image classification","authors":"Yining Xie, Zequn Liu, Jing Zhao, Jiayi Ma","doi":"10.1016/j.inffus.2024.102847","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102847","url":null,"abstract":"The large size of whole slide images (WSIs) in pathology makes it difficult to obtain fine-grained annotations. Therefore, multi-instance learning (MIL) methods are typically utilized to classify histopathology WSIs. However, current models overly focus on local features of instances, neglecting connection between local features and global features. Additionally, they tend to recognize simple instances while struggling to distinguish hard instances. To address the above issues, we design a two-stage MIL model training approach (PHIM-MIL). In the first stage, a downstream aggregation model is pre-trained to equip it with the ability to recognize simple instances. In the second stage, we integrate global information and make the model focus on mining hard instances. First, the similarity between instances and prototypes is leveraged for weighted aggregation and hence obtaining semi-global features, which helps model understand the relationship between each instance and the global features. Then, instance features and semi-global features are fused to enhance instance feature information, bringing similar instances closer while alienating dissimilar ones. Finally, the hard instance mining strategy is employed to process the fused features, improving the pre-trained aggregation model’s capability to recognize and handle hard instances. Extensive experimental results on the GastricCancer and Camelyon16 datasets demonstrate that PHIM-MIL outperforms other latest state-of-the-art methods in terms of performance and computing cost. Meanwhile, PHIM-MIL continues to deliver consistent performance improvements when the feature extraction network is replaced.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"46 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fusion-enhanced multi-label feature selection with sparse supplementation 基于稀疏补充的融合增强多标签特征选择

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-05 DOI: 10.1016/j.inffus.2024.102813

Yonghao Li, Xiangkun Wang, Xin Yang, Wanfu Gao, Weiping Ding, Tianrui Li

The exponential increase of multi-label data over various domains demands the development of effective feature selection methods. However, current sparse-learning-based feature selection methods that use LASSO-norm and l2,1-norm fail to handle two crucial issues for multi-label data. Firstly, LASSO-based methods remove features with zero-weight values during the feature selection process, some of which may have a certain degree of classification ability. Secondly, l2,1-norm-based methods may select redundant features that lead to inefficient classification results. To overcome these issues, we propose a novel sparse supplementation norm that combines inner product regularization and l2,1-norm as a novel fusion norm. This innovative fusion norm is designed to enhance the sparsity of feature selection models by leveraging the inherent row-sparse property in the l2,1-norm. Specifically, the inner product regularization norm can maintain features with potentially useful classification information, which may be discarded in traditional LASSO-based methods. At the same time, the inner product regularization norm can remove redundant features, which is introduced in traditional l2,1-norm-based methods. By incorporating this fusion norm into the Sparse-supplementation Regularized multi-label Feature Selection (SRFS) model, our method mitigates feature omission and feature redundancy, ensuring more effective and efficient feature selection for multi-label classification tasks. The experimental results on various benchmark datasets validate the efficiency and effectiveness of our proposed SRFS model.

不同领域的多标签数据呈指数增长，要求开发有效的特征选择方法。然而，目前使用lasso -范数和l2,1-范数的基于稀疏学习的特征选择方法无法处理多标签数据的两个关键问题。首先，基于lasso的方法在特征选择过程中去除权值为零的特征，其中一些特征可能具有一定的分类能力。其次，基于l2,1-norm的方法可能会选择冗余的特征，导致分类结果效率低下。为了克服这些问题，我们提出了一种新的稀疏补充范数，它将内积正则化和l2,1范数结合起来作为一种新的融合范数。这种创新的融合范数旨在利用l2,1范数固有的行稀疏特性来增强特征选择模型的稀疏性。具体而言，内积正则化范数可以保留具有潜在有用分类信息的特征，而传统的基于lasso的方法可能会丢弃这些特征。同时，内积正则化范数可以去除传统基于l2,1范数方法中引入的冗余特征。通过将该融合范数融入到稀疏补充正则化多标签特征选择（SRFS）模型中，我们的方法减轻了特征遗漏和特征冗余，确保了多标签分类任务中更有效和高效的特征选择。在各种基准数据集上的实验结果验证了我们提出的SRFS模型的效率和有效性。

{"title":"Fusion-enhanced multi-label feature selection with sparse supplementation","authors":"Yonghao Li, Xiangkun Wang, Xin Yang, Wanfu Gao, Weiping Ding, Tianrui Li","doi":"10.1016/j.inffus.2024.102813","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102813","url":null,"abstract":"The exponential increase of multi-label data over various domains demands the development of effective feature selection methods. However, current sparse-learning-based feature selection methods that use LASSO-norm and <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm fail to handle two crucial issues for multi-label data. Firstly, LASSO-based methods remove features with zero-weight values during the feature selection process, some of which may have a certain degree of classification ability. Secondly, <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm-based methods may select redundant features that lead to inefficient classification results. To overcome these issues, we propose a novel sparse supplementation norm that combines inner product regularization and <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm as a novel fusion norm. This innovative fusion norm is designed to enhance the sparsity of feature selection models by leveraging the inherent row-sparse property in the <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm. Specifically, the inner product regularization norm can maintain features with potentially useful classification information, which may be discarded in traditional LASSO-based methods. At the same time, the inner product regularization norm can remove redundant features, which is introduced in traditional <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm-based methods. By incorporating this fusion norm into the Sparse-supplementation Regularized multi-label Feature Selection (SRFS) model, our method mitigates feature omission and feature redundancy, ensuring more effective and efficient feature selection for multi-label classification tasks. The experimental results on various benchmark datasets validate the efficiency and effectiveness of our proposed SRFS model.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"83 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142793818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0