Pub Date : 2024-12-19DOI: 10.1016/j.inffus.2024.102884
Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou
RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: https://github.com/zsy950116/CDF-DSR.
{"title":"CDF-DSR: Learning continuous depth field for self-supervised RGB-guided depth map super resolution","authors":"Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou","doi":"10.1016/j.inffus.2024.102884","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102884","url":null,"abstract":"RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: <ce:inter-ref xlink:href=\"https://github.com/zsy950116/CDF-DSR\" xlink:type=\"simple\">https://github.com/zsy950116/CDF-DSR</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"359 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.
{"title":"ADF-OCT: An advanced Assistive Diagnosis Framework for study-level macular optical coherence tomography","authors":"Weihao Gao, Wangting Li, Dong Fang, Zheng Gong, Chucheng Chen, Zhuo Deng, Fuju Rong, Lu Chen, Lujia Feng, Canfeng Huang, Jia Liang, Yijing Zhuang, Pengxue Wei, Ting Xie, Zhiyuan Niu, Fang Li, Xianling Tang, Bing Zhang, Zixia Zhou, Shaochong Zhang, Lan Ma","doi":"10.1016/j.inffus.2024.102877","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102877","url":null,"abstract":"Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"11 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-12DOI: 10.1016/j.inffus.2024.102869
Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du
Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose Diff-PC, a diffusion-based framework for zero-shot PC, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.
肖像定制(PC)因其潜在的应用前景而受到广泛关注。然而,现有的 PC 方法缺乏精确的身份(ID)保存和人脸控制。为了解决这些问题,我们提出了 Diff-PC,这是一种基于扩散的零镜头 PC 框架,可生成具有高 ID 保真度、指定面部属性和多样化背景的逼真肖像。具体来说,我们的方法采用三维人脸预测器来重建三维感知的面部先验,其中包括参考 ID、目标表情和姿势。为了捕捉细粒度的面部细节,我们设计了融合局部和全局面部特征的 ID 编码器。随后,我们设计了 ID-Ctrl,利用三维人脸来指导 ID 特征的对齐。我们进一步引入了 ID 注入器,以增强 ID 的保真度和面部可控性。最后,在我们收集的以 ID 为中心的数据集上进行训练,提高了人脸相似度和文本到图像(T2I)的对齐度。广泛的实验证明,Diff-PC 在 ID 保存、面部控制和 T2I 一致性方面超越了最先进的方法。值得注意的是,在所有数据集上,人脸相似度都提高了约 +3%。此外,我们的方法与多风格基础模型兼容。
{"title":"Diff-PC: Identity-preserving and 3D-aware controllable diffusion for zero-shot portrait customization","authors":"Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du","doi":"10.1016/j.inffus.2024.102869","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102869","url":null,"abstract":"Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose <ce:bold>Diff-PC</ce:bold>, a <ce:bold>diff</ce:bold>usion-based framework for zero-shot <ce:bold>PC</ce:bold>, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"22 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ν support vector machine (ν-SVM) is an enhanced algorithm derived from support vector machines using parameter ν to replace the original penalty coefficient C. Because of the narrower range of ν compared with the infinite range of C, ν-SVM generally outperforms the standard SVM. Granular ball computing is an information fusion method that enhances system robustness and reduces uncertainty. To further improve the efficiency and robustness of support vector algorithms, this paper introduces the concept of multigranularity granular balls and proposes the controllable multigranularity SVM (Con-MGSVM) and the controllable multigranularity support vector regression machine (Con-MGSVR). These models use granular computing theory, replacing original fine-grained points with coarse-grained “granular balls” as inputs to a classifier or regressor. By introducing control parameter ν, the number of support granular balls can be further reduced, thereby enhancing computational efficiency and improving robustness and interpretability. Furthermore, this paper derives and solves the dual models of Con-MGSVM and Con-MGSVR and conducts a comparative study on the relationship between the granular ball SVM (GBSVM) and the Con-MGSVM model, elucidating the importance of control parameters. Experimental results demonstrate that Con-MGSVM and Con-MGSVR not only improve accuracy and fitting performance but also effectively reduce the number of support granular balls.
{"title":"[formula omitted]-MGSVM: Controllable multi-granularity support vector algorithm for classification and regression","authors":"Yabin Shao, Youlin Hua, Zengtai Gong, Xueqin Zhu, Yunlong Cheng, Laquan Li, Shuyin Xia","doi":"10.1016/j.inffus.2024.102867","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102867","url":null,"abstract":"The <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math> support vector machine (<mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math>-SVM) is an enhanced algorithm derived from support vector machines using parameter <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math> to replace the original penalty coefficient <mml:math altimg=\"si4.svg\" display=\"inline\"><mml:mi>C</mml:mi></mml:math>. Because of the narrower range of <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math> compared with the infinite range of <mml:math altimg=\"si4.svg\" display=\"inline\"><mml:mi>C</mml:mi></mml:math>, <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math>-SVM generally outperforms the standard SVM. Granular ball computing is an information fusion method that enhances system robustness and reduces uncertainty. To further improve the efficiency and robustness of support vector algorithms, this paper introduces the concept of multigranularity granular balls and proposes the controllable multigranularity SVM (<mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM) and the controllable multigranularity support vector regression machine (<mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVR). These models use granular computing theory, replacing original fine-grained points with coarse-grained “granular balls” as inputs to a classifier or regressor. By introducing control parameter <mml:math altimg=\"si140.svg\" display=\"inline\"><mml:mi>ν</mml:mi></mml:math>, the number of support granular balls can be further reduced, thereby enhancing computational efficiency and improving robustness and interpretability. Furthermore, this paper derives and solves the dual models of <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM and <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVR and conducts a comparative study on the relationship between the granular ball SVM (GBSVM) and the <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM model, elucidating the importance of control parameters. Experimental results demonstrate that <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVM and <mml:math altimg=\"si436.svg\" display=\"inline\"><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:math>-MGSVR not only improve accuracy and fitting performance but also effectively reduce the number of support granular balls.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"43 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-11DOI: 10.1016/j.inffus.2024.102859
Hao Du, Lu Ren, Yuanda Wang, Xiang Cao, Changyin Sun
The multi-sensor data fusion perception technology, as a pivotal technique for achieving complex environmental perception and decision-making, has been garnering extensive attention from researchers. To date, there has been a lack of comprehensive review articles discussing the research progress of multi-sensor fusion perception systems for embodied agents, particularly in terms of analyzing the agent’s perception of itself and the surrounding scene. To address this gap and encourage further research, this study defines key terminology and analyzes datasets from the past two decades, focusing on advancements in multi-sensor fusion SLAM and multi-sensor scene perception. These key designs can aid researchers in gaining a better understanding of the field and initiating research in the domain of multi-sensor fusion perception for embodied agents. In this survey, we begin with a brief introduction to common sensor types and their characteristics. We then delve into the multi-sensor fusion perception datasets tailored for the domains of autonomous driving, drones, unmanned ground vehicles, and unmanned surface vehicles. Following this, we discuss the classification and fundamental principles of existing multi-sensor data fusion SLAM algorithms, and present the experimental outcomes of various classical fusion frameworks. Subsequently, we comprehensively review the technologies of multi-sensor data fusion scene perception, including object detection, semantic segmentation, instance segmentation, and panoramic understanding. Finally, we summarize our findings and discuss potential future developments in multi-sensor fusion perception technology.
多传感器数据融合感知技术作为实现复杂环境感知和决策的关键技术,一直受到研究人员的广泛关注。迄今为止,还缺乏全面的综述文章来讨论面向具身代理的多传感器融合感知系统的研究进展,尤其是在分析代理对自身和周围场景的感知方面。为了填补这一空白并鼓励进一步研究,本研究定义了关键术语,并分析了过去二十年的数据集,重点关注多传感器融合 SLAM 和多传感器场景感知方面的进展。这些关键设计可帮助研究人员更好地了解该领域,并启动面向具身代理的多传感器融合感知领域的研究。在这份调查报告中,我们首先简要介绍了常见的传感器类型及其特点。然后,我们深入探讨了为自动驾驶、无人机、无人地面车辆和无人水面车辆等领域量身定制的多传感器融合感知数据集。随后,我们讨论了现有多传感器数据融合 SLAM 算法的分类和基本原理,并介绍了各种经典融合框架的实验结果。随后,我们全面回顾了多传感器数据融合场景感知技术,包括物体检测、语义分割、实例分割和全景理解。最后,我们总结了我们的研究成果,并讨论了多传感器融合感知技术的未来发展潜力。
{"title":"Advancements in perception system with multi-sensor fusion for embodied agents","authors":"Hao Du, Lu Ren, Yuanda Wang, Xiang Cao, Changyin Sun","doi":"10.1016/j.inffus.2024.102859","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102859","url":null,"abstract":"The multi-sensor data fusion perception technology, as a pivotal technique for achieving complex environmental perception and decision-making, has been garnering extensive attention from researchers. To date, there has been a lack of comprehensive review articles discussing the research progress of multi-sensor fusion perception systems for embodied agents, particularly in terms of analyzing the agent’s perception of itself and the surrounding scene. To address this gap and encourage further research, this study defines key terminology and analyzes datasets from the past two decades, focusing on advancements in multi-sensor fusion SLAM and multi-sensor scene perception. These key designs can aid researchers in gaining a better understanding of the field and initiating research in the domain of multi-sensor fusion perception for embodied agents. In this survey, we begin with a brief introduction to common sensor types and their characteristics. We then delve into the multi-sensor fusion perception datasets tailored for the domains of autonomous driving, drones, unmanned ground vehicles, and unmanned surface vehicles. Following this, we discuss the classification and fundamental principles of existing multi-sensor data fusion SLAM algorithms, and present the experimental outcomes of various classical fusion frameworks. Subsequently, we comprehensively review the technologies of multi-sensor data fusion scene perception, including object detection, semantic segmentation, instance segmentation, and panoramic understanding. Finally, we summarize our findings and discuss potential future developments in multi-sensor fusion perception technology.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"47 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-11DOI: 10.1016/j.inffus.2024.102860
Tiesunlong Shen, Erik Cambria, Jin Wang, Yi Cai, Xuejie Zhang
Large language models (LLMs) cannot see or understand graphs. The current Graph LLM method transform graph structures into a format LLMs understands, utilizing LLM as a predictor to perform graph-learning task. However, these approaches have underperformed in graph-learning tasks. The issues arise because these methods typically rely on a fixed neighbor hop count for the target node set by expert experience, limiting the LLM’s access to only a certain range of neighbor information. Due to the black-box nature of LLM, it is challenging to determine which specific pieces of neighborhood information can effectively assist LLMs in making accurate inferences, which prevents LLMs from generating correct inferences. This study proposes to assist LLM in gaining insight at the right spot by providing decisive subgraph information to Graph LLM with reinforcement learning (Spider). A reinforcement subgraph detection module was designed to search for essential neighborhoods that influence LLM’s predictions. A decisive node-guided network was then applied to guide the reinforcement subgraph network, allowing LLMs to rely more on crucial nodes within the essential neighborhood for predictions. Essential neighborhood and decisive node information are provided to LLM in text form without the requirement of retraining. Experiments on five graph learning datasets demonstrate the effectiveness of the proposed model against all baselines, including GNN and LLM methods.
{"title":"Insight at the right spot: Provide decisive subgraph information to Graph LLM with reinforcement learning","authors":"Tiesunlong Shen, Erik Cambria, Jin Wang, Yi Cai, Xuejie Zhang","doi":"10.1016/j.inffus.2024.102860","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102860","url":null,"abstract":"Large language models (LLMs) cannot see or understand graphs. The current Graph LLM method transform graph structures into a format LLMs understands, utilizing LLM as a predictor to perform graph-learning task. However, these approaches have underperformed in graph-learning tasks. The issues arise because these methods typically rely on a fixed neighbor hop count for the target node set by expert experience, limiting the LLM’s access to only a certain range of neighbor information. Due to the black-box nature of LLM, it is challenging to determine which specific pieces of neighborhood information can effectively assist LLMs in making accurate inferences, which prevents LLMs from generating correct inferences. This study proposes to assist LLM in gaining insight at the right <ce:bold><ce:italic>s</ce:italic></ce:bold>pot by <ce:bold><ce:italic>p</ce:italic></ce:bold>rov<ce:bold><ce:italic>i</ce:italic></ce:bold>ding <ce:bold><ce:italic>de</ce:italic></ce:bold>cisive subgraph information to Graph LLM with <ce:bold><ce:italic>r</ce:italic></ce:bold>einforcement learning (<ce:bold><ce:italic>Spider</ce:italic></ce:bold>). A reinforcement subgraph detection module was designed to search for essential neighborhoods that influence LLM’s predictions. A decisive node-guided network was then applied to guide the reinforcement subgraph network, allowing LLMs to rely more on crucial nodes within the essential neighborhood for predictions. Essential neighborhood and decisive node information are provided to LLM in text form without the requirement of retraining. Experiments on five graph learning datasets demonstrate the effectiveness of the proposed model against all baselines, including GNN and LLM methods.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"50 5 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10DOI: 10.1016/j.inffus.2024.102858
Bihter Das, Huseyin Alperen Dagdogen, Muhammed Onur Kaya, Resul Das
Accurate diagnosis of monkeypox is challenging due to the limitations of current diagnostic techniques, which struggle to account for skin lesions’ complex visual and structural characteristics. This study aims to develop a novel hybrid model that combines the strengths of Vision Transformers (ViT), ResNet50, and AlexNet with Graph Convolutional Networks (GCN) to improve monkeypox diagnostic accuracy. Our method captures both the visual features and structural relationships within skin lesions, offering a more comprehensive approach to classification. Rigorous testing on two distinct datasets demonstrated that the ViT+GCN model achieved superior accuracy, particularly excelling in binary classification with 100% accuracy and multi-class classification with a 97% accuracy rate. These findings indicate that integrating visual and structural information enhances diagnostic reliability. While promising, this model requires further development, including larger datasets and optimization for real-time applications. Overall, this approach advances dermatological diagnostics and holds potential for broader applications in diagnosing other skin-related diseases.
{"title":"A novel hybrid model combining Vision Transformers and Graph Convolutional Networks for monkeypox disease effective diagnosis","authors":"Bihter Das, Huseyin Alperen Dagdogen, Muhammed Onur Kaya, Resul Das","doi":"10.1016/j.inffus.2024.102858","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102858","url":null,"abstract":"Accurate diagnosis of monkeypox is challenging due to the limitations of current diagnostic techniques, which struggle to account for skin lesions’ complex visual and structural characteristics. This study aims to develop a novel hybrid model that combines the strengths of Vision Transformers (ViT), ResNet50, and AlexNet with Graph Convolutional Networks (GCN) to improve monkeypox diagnostic accuracy. Our method captures both the visual features and structural relationships within skin lesions, offering a more comprehensive approach to classification. Rigorous testing on two distinct datasets demonstrated that the ViT+GCN model achieved superior accuracy, particularly excelling in binary classification with 100% accuracy and multi-class classification with a 97% accuracy rate. These findings indicate that integrating visual and structural information enhances diagnostic reliability. While promising, this model requires further development, including larger datasets and optimization for real-time applications. Overall, this approach advances dermatological diagnostics and holds potential for broader applications in diagnosing other skin-related diseases.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"3 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10DOI: 10.1016/j.inffus.2024.102846
Yujie Mo, Heng Tao Shen, Xiaofeng Zhu
Heterogeneous graph representation learning (HGRL), as one of powerful techniques to process the heterogeneous graph data, has shown superior performance and attracted increasing attention. However, existing HGRL methods still face issues to be addressed: (i) They capture the consistency among different meta-path-based views to induce expensive computation costs and possibly cause dimension collapse. (ii) They ignore the complementarity within each meta-path-based view to degrade the model’s effectiveness. To alleviate these issues, in this paper, we propose a new self-supervised HGRL framework to capture the consistency among different views, maintain the complementarity within each view, and avoid dimension collapse. Specifically, the proposed method investigates the correlation loss to capture the consistency among different views and reduce the dimension redundancy, as well as investigates the reconstruction loss to maintain complementarity within each view to benefit downstream tasks. We further theoretically prove that the proposed method can effectively incorporate task-relevant information into node representations, thereby enhancing performance in downstream tasks. Extensive experiments on multiple public datasets validate the effectiveness and efficiency of the proposed method on downstream tasks.
{"title":"Efficient self-supervised heterogeneous graph representation learning with reconstruction","authors":"Yujie Mo, Heng Tao Shen, Xiaofeng Zhu","doi":"10.1016/j.inffus.2024.102846","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102846","url":null,"abstract":"Heterogeneous graph representation learning (HGRL), as one of powerful techniques to process the heterogeneous graph data, has shown superior performance and attracted increasing attention. However, existing HGRL methods still face issues to be addressed: (i) They capture the consistency among different meta-path-based views to induce expensive computation costs and possibly cause dimension collapse. (ii) They ignore the complementarity within each meta-path-based view to degrade the model’s effectiveness. To alleviate these issues, in this paper, we propose a new self-supervised HGRL framework to capture the consistency among different views, maintain the complementarity within each view, and avoid dimension collapse. Specifically, the proposed method investigates the correlation loss to capture the consistency among different views and reduce the dimension redundancy, as well as investigates the reconstruction loss to maintain complementarity within each view to benefit downstream tasks. We further theoretically prove that the proposed method can effectively incorporate task-relevant information into node representations, thereby enhancing performance in downstream tasks. Extensive experiments on multiple public datasets validate the effectiveness and efficiency of the proposed method on downstream tasks.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"116 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10DOI: 10.1016/j.inffus.2024.102847
Yining Xie, Zequn Liu, Jing Zhao, Jiayi Ma
The large size of whole slide images (WSIs) in pathology makes it difficult to obtain fine-grained annotations. Therefore, multi-instance learning (MIL) methods are typically utilized to classify histopathology WSIs. However, current models overly focus on local features of instances, neglecting connection between local features and global features. Additionally, they tend to recognize simple instances while struggling to distinguish hard instances. To address the above issues, we design a two-stage MIL model training approach (PHIM-MIL). In the first stage, a downstream aggregation model is pre-trained to equip it with the ability to recognize simple instances. In the second stage, we integrate global information and make the model focus on mining hard instances. First, the similarity between instances and prototypes is leveraged for weighted aggregation and hence obtaining semi-global features, which helps model understand the relationship between each instance and the global features. Then, instance features and semi-global features are fused to enhance instance feature information, bringing similar instances closer while alienating dissimilar ones. Finally, the hard instance mining strategy is employed to process the fused features, improving the pre-trained aggregation model’s capability to recognize and handle hard instances. Extensive experimental results on the GastricCancer and Camelyon16 datasets demonstrate that PHIM-MIL outperforms other latest state-of-the-art methods in terms of performance and computing cost. Meanwhile, PHIM-MIL continues to deliver consistent performance improvements when the feature extraction network is replaced.
病理学中的整张切片图像(WSI)尺寸较大,很难获得精细的注释。因此,通常采用多实例学习(MIL)方法对组织病理学 WSI 进行分类。然而,目前的模型过于关注实例的局部特征,忽视了局部特征与全局特征之间的联系。此外,它们往往只能识别简单的实例,而难以区分困难的实例。为解决上述问题,我们设计了一种两阶段 MIL 模型训练方法(PHIM-MIL)。在第一阶段,对下游聚合模型进行预训练,使其具备识别简单实例的能力。在第二阶段,我们整合全局信息,使模型专注于挖掘困难实例。首先,利用实例与原型之间的相似性进行加权聚合,从而获得半全局特征,这有助于模型理解每个实例与全局特征之间的关系。然后,融合实例特征和半全局特征以增强实例特征信息,拉近相似实例的距离,同时疏远不相似的实例。最后,采用硬实例挖掘策略来处理融合后的特征,从而提高预训练聚合模型识别和处理硬实例的能力。在 GastricCancer 和 Camelyon16 数据集上的大量实验结果表明,PHIM-MIL 在性能和计算成本方面都优于其他最新的先进方法。同时,当更换特征提取网络时,PHIM-MIL 仍能持续提高性能。
{"title":"PHIM-MIL: Multiple instance learning with prototype similarity-guided feature fusion and hard instance mining for whole slide image classification","authors":"Yining Xie, Zequn Liu, Jing Zhao, Jiayi Ma","doi":"10.1016/j.inffus.2024.102847","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102847","url":null,"abstract":"The large size of whole slide images (WSIs) in pathology makes it difficult to obtain fine-grained annotations. Therefore, multi-instance learning (MIL) methods are typically utilized to classify histopathology WSIs. However, current models overly focus on local features of instances, neglecting connection between local features and global features. Additionally, they tend to recognize simple instances while struggling to distinguish hard instances. To address the above issues, we design a two-stage MIL model training approach (PHIM-MIL). In the first stage, a downstream aggregation model is pre-trained to equip it with the ability to recognize simple instances. In the second stage, we integrate global information and make the model focus on mining hard instances. First, the similarity between instances and prototypes is leveraged for weighted aggregation and hence obtaining semi-global features, which helps model understand the relationship between each instance and the global features. Then, instance features and semi-global features are fused to enhance instance feature information, bringing similar instances closer while alienating dissimilar ones. Finally, the hard instance mining strategy is employed to process the fused features, improving the pre-trained aggregation model’s capability to recognize and handle hard instances. Extensive experimental results on the GastricCancer and Camelyon16 datasets demonstrate that PHIM-MIL outperforms other latest state-of-the-art methods in terms of performance and computing cost. Meanwhile, PHIM-MIL continues to deliver consistent performance improvements when the feature extraction network is replaced.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"46 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The exponential increase of multi-label data over various domains demands the development of effective feature selection methods. However, current sparse-learning-based feature selection methods that use LASSO-norm and l2,1-norm fail to handle two crucial issues for multi-label data. Firstly, LASSO-based methods remove features with zero-weight values during the feature selection process, some of which may have a certain degree of classification ability. Secondly, l2,1-norm-based methods may select redundant features that lead to inefficient classification results. To overcome these issues, we propose a novel sparse supplementation norm that combines inner product regularization and l2,1-norm as a novel fusion norm. This innovative fusion norm is designed to enhance the sparsity of feature selection models by leveraging the inherent row-sparse property in the l2,1-norm. Specifically, the inner product regularization norm can maintain features with potentially useful classification information, which may be discarded in traditional LASSO-based methods. At the same time, the inner product regularization norm can remove redundant features, which is introduced in traditional l2,1-norm-based methods. By incorporating this fusion norm into the Sparse-supplementation Regularized multi-label Feature Selection (SRFS) model, our method mitigates feature omission and feature redundancy, ensuring more effective and efficient feature selection for multi-label classification tasks. The experimental results on various benchmark datasets validate the efficiency and effectiveness of our proposed SRFS model.
{"title":"Fusion-enhanced multi-label feature selection with sparse supplementation","authors":"Yonghao Li, Xiangkun Wang, Xin Yang, Wanfu Gao, Weiping Ding, Tianrui Li","doi":"10.1016/j.inffus.2024.102813","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102813","url":null,"abstract":"The exponential increase of multi-label data over various domains demands the development of effective feature selection methods. However, current sparse-learning-based feature selection methods that use LASSO-norm and <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm fail to handle two crucial issues for multi-label data. Firstly, LASSO-based methods remove features with zero-weight values during the feature selection process, some of which may have a certain degree of classification ability. Secondly, <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm-based methods may select redundant features that lead to inefficient classification results. To overcome these issues, we propose a novel sparse supplementation norm that combines inner product regularization and <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm as a novel fusion norm. This innovative fusion norm is designed to enhance the sparsity of feature selection models by leveraging the inherent row-sparse property in the <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm. Specifically, the inner product regularization norm can maintain features with potentially useful classification information, which may be discarded in traditional LASSO-based methods. At the same time, the inner product regularization norm can remove redundant features, which is introduced in traditional <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>-norm-based methods. By incorporating this fusion norm into the Sparse-supplementation Regularized multi-label Feature Selection (SRFS) model, our method mitigates feature omission and feature redundancy, ensuring more effective and efficient feature selection for multi-label classification tasks. The experimental results on various benchmark datasets validate the efficiency and effectiveness of our proposed SRFS model.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"83 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142793818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}