Pub Date : 2024-12-22DOI: 10.1016/j.inffus.2024.102861
Ziqiang Chen, Dandan Wang, Liangliang Lou, Shiqing Zhang, Xiaoming Zhao, Shuqiang Jiang, Jun Yu, Jun Xiao
Depression, a widespread and debilitating mental health disorder, requires early detection to facilitate effective intervention. Automated depression detection integrating audio with text modalities is a challenging yet significant issue due to the information redundancy and inter-modal heterogeneity across modalities. Prior works usually fail to fully learn the interaction of audio–text modalities for depression detection in an explicit manner. To address these issues, this work proposes a novel text-guided multimdoal depression detection method based on a cross-modal feature reconstruction and decomposition framework. The proposed method takes the text modality as the core modality to guide the model to reconstruct comprehensive audio features for cross-modal feature decomposition tasks. Moreover, the designed cross-modal feature reconstruction and decomposition framework aims to disentangle the shared and private features from the text-guided reconstructed comprehensive audio features for subsequent multimodal fusion. Besides, a bi-directional cross-attention module is designed to interactively learn simultaneous and mutual correlations across modalities for feature enhancement. Extensive experiments are performed on the DAIC-WoZ and E-DAIC datasets, and the results show the superiority of the proposed method on multimodal depression detection tasks, outperforming the state-of-the-arts.
{"title":"Text-guided multimodal depression detection via cross-modal feature reconstruction and decomposition","authors":"Ziqiang Chen, Dandan Wang, Liangliang Lou, Shiqing Zhang, Xiaoming Zhao, Shuqiang Jiang, Jun Yu, Jun Xiao","doi":"10.1016/j.inffus.2024.102861","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102861","url":null,"abstract":"Depression, a widespread and debilitating mental health disorder, requires early detection to facilitate effective intervention. Automated depression detection integrating audio with text modalities is a challenging yet significant issue due to the information redundancy and inter-modal heterogeneity across modalities. Prior works usually fail to fully learn the interaction of audio–text modalities for depression detection in an explicit manner. To address these issues, this work proposes a novel text-guided multimdoal depression detection method based on a cross-modal feature reconstruction and decomposition framework. The proposed method takes the text modality as the core modality to guide the model to reconstruct comprehensive audio features for cross-modal feature decomposition tasks. Moreover, the designed cross-modal feature reconstruction and decomposition framework aims to disentangle the shared and private features from the text-guided reconstructed comprehensive audio features for subsequent multimodal fusion. Besides, a bi-directional cross-attention module is designed to interactively learn simultaneous and mutual correlations across modalities for feature enhancement. Extensive experiments are performed on the DAIC-WoZ and E-DAIC datasets, and the results show the superiority of the proposed method on multimodal depression detection tasks, outperforming the state-of-the-arts.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"65 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-21DOI: 10.1016/j.inffus.2024.102887
Yaqin Li, Yihong Dong, Shoubo Peng, Linlin Gao, Yu Xin
Open-set recognition (OSR) refers to the challenge of introducing classes not seen during model training into the test set. This issue is particularly critical in the medical field due to incomplete data collection and the continuous emergence of new and rare diseases. Medical OSR techniques necessitate not only the accurate classification of known cases but also the ability to detect unknown cases and send the corresponding information to experts for further diagnosis. However, there is a significant research gap in the current medical OSR field, which not only lacks research methods for OSR in psychiatric disorders, but also lacks detailed procedures for OSR evaluation based on neuroimaging. To address the challenges associated with the OSR of psychiatric disorders, we propose a method named the open-set risk collaborative consistency graph neural network (ORC-GNN). First, functional connectivity (FC) is used to extract measurable representations in the deep feature space by coordinating hemispheric and whole-brain networks, thereby achieving multi-level brain network feature fusion and regional communication. Subsequently, these representations are used to guide the model to adaptively learn the decision boundaries for known classes using the instance-level density awareness and to identify samples outside these boundaries as unknown. We introduce a novel open-risk margin loss (ORML) to balance empirical risk and open-space risk; this approach makes open-space risk quantifiable through the introduction of open-risk term. We evaluate our method using an integrated multi-class dataset and a tailored experimental protocol suited for psychiatric disorder-related OSR challenges. Compared to state-of-the-art techniques, ORC-GNN demonstrates significant performance improvements and yields important clinically interpretative information regarding the shared and distinct characteristics of multiple psychiatric disorders.
{"title":"ORC-GNN: A novel open set recognition based on graph neural network for multi-class classification of psychiatric disorders","authors":"Yaqin Li, Yihong Dong, Shoubo Peng, Linlin Gao, Yu Xin","doi":"10.1016/j.inffus.2024.102887","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102887","url":null,"abstract":"Open-set recognition (OSR) refers to the challenge of introducing classes not seen during model training into the test set. This issue is particularly critical in the medical field due to incomplete data collection and the continuous emergence of new and rare diseases. Medical OSR techniques necessitate not only the accurate classification of known cases but also the ability to detect unknown cases and send the corresponding information to experts for further diagnosis. However, there is a significant research gap in the current medical OSR field, which not only lacks research methods for OSR in psychiatric disorders, but also lacks detailed procedures for OSR evaluation based on neuroimaging. To address the challenges associated with the OSR of psychiatric disorders, we propose a method named the open-set risk collaborative consistency graph neural network (ORC-GNN). First, functional connectivity (FC) is used to extract measurable representations in the deep feature space by coordinating hemispheric and whole-brain networks, thereby achieving multi-level brain network feature fusion and regional communication. Subsequently, these representations are used to guide the model to adaptively learn the decision boundaries for known classes using the instance-level density awareness and to identify samples outside these boundaries as unknown. We introduce a novel open-risk margin loss (ORML) to balance empirical risk and open-space risk; this approach makes open-space risk quantifiable through the introduction of open-risk term. We evaluate our method using an integrated multi-class dataset and a tailored experimental protocol suited for psychiatric disorder-related OSR challenges. Compared to state-of-the-art techniques, ORC-GNN demonstrates significant performance improvements and yields important clinically interpretative information regarding the shared and distinct characteristics of multiple psychiatric disorders.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"33 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-21DOI: 10.1016/j.inffus.2024.102889
Tongyue He, Junxin Chen, M. Shamim Hossain, Zhihan Lyu
To date, Parkinson’s disease (PD) is an incurable neurological disorder, and the time of quality life can only be extended through early detection and timely intervention. However, the symptoms of early PD are both heterogeneous and subtle. To cope with these challenges, we develop a two-level fusion framework for smart healthcare, leveraging smartphones interconnected with the Internet of Medical Things and exploring the contribution of multi-sensor and multi-activity data. Rotation rate and acceleration during walking activity are recorded with the gyroscope and accelerometer, while location coordinates and acceleration during tapping activity are collected via the touch screen and accelerometer, and voice signals are captured by the microphone. The main scientific contribution is the enhanced fusion of multi-sensor information to cope with the heterogeneous and subtle nature of early PD symptoms, achieved by a first-level component that fuses features within a single activity using an attention mechanism and a second-level component that dynamically allocates weights across activities. Compared with related works, the proposed framework explores the potential of fusing multi-sensor data within a single activity, and mines the importance of different activities that correspond to early PD symptoms. The proposed two-level fusion framework achieves an AUC of 0.891 (95 % CI, 0.860–0.921) and a sensitivity of 0.950 (95 % CI, 0.888–1.000) in early PD detection, demonstrating that it efficiently fuses information from different sensor data for various activities and has a strong fault tolerance for data.
{"title":"Enhanced detection of early Parkinson’ s disease through multi-sensor fusion on smartphone-based IoMT platforms","authors":"Tongyue He, Junxin Chen, M. Shamim Hossain, Zhihan Lyu","doi":"10.1016/j.inffus.2024.102889","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102889","url":null,"abstract":"To date, Parkinson’s disease (PD) is an incurable neurological disorder, and the time of quality life can only be extended through early detection and timely intervention. However, the symptoms of early PD are both heterogeneous and subtle. To cope with these challenges, we develop a two-level fusion framework for smart healthcare, leveraging smartphones interconnected with the Internet of Medical Things and exploring the contribution of multi-sensor and multi-activity data. Rotation rate and acceleration during walking activity are recorded with the gyroscope and accelerometer, while location coordinates and acceleration during tapping activity are collected via the touch screen and accelerometer, and voice signals are captured by the microphone. The main scientific contribution is the enhanced fusion of multi-sensor information to cope with the heterogeneous and subtle nature of early PD symptoms, achieved by a first-level component that fuses features within a single activity using an attention mechanism and a second-level component that dynamically allocates weights across activities. Compared with related works, the proposed framework explores the potential of fusing multi-sensor data within a single activity, and mines the importance of different activities that correspond to early PD symptoms. The proposed two-level fusion framework achieves an AUC of 0.891 (95 % CI, 0.860–0.921) and a sensitivity of 0.950 (95 % CI, 0.888–1.000) in early PD detection, demonstrating that it efficiently fuses information from different sensor data for various activities and has a strong fault tolerance for data.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"166 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142902106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Video-based Visible-Infrared person Re-identification (VVI-ReID) is challenging due to the large inter-view and inter-modal discrepancies. To alleviate these discrepancies, most existing works only focus on whole images, while more id-related partial information is ignored. Furthermore, the inference decision is commonly based on the similarity of two samples. However, the semantic gap between the query and gallery samples inevitably exists due to their inter-view misalignment, no matter whether the modality-gap is removed. In this paper, we proposed a Hierarchical Disturbance (HD) and Group Inference (GI) method to handle aforementioned issues. Specifically, the HD module models the inter-view and inter-modal discrepancies as multiple image styles, and conducts feature disturbances through partially transferring body styles. By hierarchically taking the partial and global features into account, our model is capable of adaptively achieving invariant but identity-related features. Additionally, instead of establishing similarity between the query sample and each gallery sample independently, the GI module is further introduced to extract complementary information from all potential intra-class gallery samples of the given query sample, which boosts the performance on matching hard samples. Extensive experiments substantiate the superiority of our method compared with state-of-the arts.
{"title":"Hierarchical disturbance and Group Inference for video-based visible-infrared person re-identification","authors":"Chuhao Zhou, Yuzhe Zhou, Tingting Ren, Huafeng Li, Jinxing Li, Guangming Lu","doi":"10.1016/j.inffus.2024.102882","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102882","url":null,"abstract":"Video-based Visible-Infrared person Re-identification (VVI-ReID) is challenging due to the large inter-view and inter-modal discrepancies. To alleviate these discrepancies, most existing works only focus on whole images, while more id-related partial information is ignored. Furthermore, the inference decision is commonly based on the similarity of two samples. However, the semantic gap between the query and gallery samples inevitably exists due to their inter-view misalignment, no matter whether the modality-gap is removed. In this paper, we proposed a Hierarchical Disturbance (HD) and Group Inference (GI) method to handle aforementioned issues. Specifically, the HD module models the inter-view and inter-modal discrepancies as multiple image styles, and conducts feature disturbances through partially transferring body styles. By hierarchically taking the partial and global features into account, our model is capable of adaptively achieving invariant but identity-related features. Additionally, instead of establishing similarity between the query sample and each gallery sample independently, the GI module is further introduced to extract complementary information from all potential intra-class gallery samples of the given query sample, which boosts the performance on matching hard samples. Extensive experiments substantiate the superiority of our method compared with state-of-the arts.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"58 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1016/j.inffus.2024.102792
F. Herrera, D. Jiménez-López, A. Argente-Garrido, N. Rodríguez-Barroso, C. Zuheros, I. Aguilera-Martos, B. Bello, M. García-Márquez, M.V. Luzón
In the realm of Artificial Intelligence (AI), the need for privacy and security in data processing has become paramount. As AI applications continue to expand, the collection and handling of sensitive data raise concerns about individual privacy protection. Federated Learning (FL) emerges as a promising solution to address these challenges by enabling decentralized model training on local devices, thus preserving data privacy. This paper introduces FLEX: a FLEXible Federated Learning Framework designed to provide maximum flexibility in FL research experiments and the possibility to deploy federated solutions. By offering customizable features for data distribution, privacy parameters, and communication strategies, FLEX empowers researchers to innovate and develop novel FL techniques. It also provides a distributed version that allows experiments to be deployed on different devices. The framework also includes libraries for specific FL implementations including: (1) anomalies, (2) blockchain, (3) adversarial attacks and defenses, (4) natural language processing and (5) decision trees, enhancing its versatility and applicability in various domains. Overall, FLEX represents a significant advancement in FL research and deployment, facilitating the development of robust and efficient FL applications.
{"title":"FLEX: Flexible Federated Learning Framework","authors":"F. Herrera, D. Jiménez-López, A. Argente-Garrido, N. Rodríguez-Barroso, C. Zuheros, I. Aguilera-Martos, B. Bello, M. García-Márquez, M.V. Luzón","doi":"10.1016/j.inffus.2024.102792","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102792","url":null,"abstract":"In the realm of Artificial Intelligence (AI), the need for privacy and security in data processing has become paramount. As AI applications continue to expand, the collection and handling of sensitive data raise concerns about individual privacy protection. Federated Learning (FL) emerges as a promising solution to address these challenges by enabling decentralized model training on local devices, thus preserving data privacy. This paper introduces FLEX: a FLEXible Federated Learning Framework designed to provide maximum flexibility in FL research experiments and the possibility to deploy federated solutions. By offering customizable features for data distribution, privacy parameters, and communication strategies, FLEX empowers researchers to innovate and develop novel FL techniques. It also provides a distributed version that allows experiments to be deployed on different devices. The framework also includes libraries for specific FL implementations including: (1) anomalies, (2) blockchain, (3) adversarial attacks and defenses, (4) natural language processing and (5) decision trees, enhancing its versatility and applicability in various domains. Overall, FLEX represents a significant advancement in FL research and deployment, facilitating the development of robust and efficient FL applications.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"50 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1016/j.inffus.2024.102881
Jianglin Dong, Yiyi Zhao, Haixia Mao, Ya Yin, Jiangping Hu
For group decision-making problems, the existing expressed and private opinions (EPOs) models focus on analyzing the limiting discrepancy between agents’ EPOs and the disagreement among agents’ private opinions under social pressure. However, they failed to consider the self-cognitive dissonance phenomenon arising from the discrepancy between agents’ EPOs or agents’ mismatched opinions and behaviors, as well as the impact of the cumulative pressure. This study proposes a novel EPOs model that updates private opinions by inferring the private opinions of social neighbors from their explicit behaviors, whereas expressed opinions updated by minimizing current social pressure. The proposed prevention and remedy mechanisms effectively address agents’ self-cognitive dissonance from different psychological perspectives. Additionally, to realize the release of the cumulative pressure, two threshold models grounded in the concepts of the self-persuasion and liberating effects in psychology are presented. The simulation results indicate that the proposed EPOs model effectively avoids the self-cognitive dissonance in a real social network. Finally, after the release of the cumulative pressure, the group EPOs will achieve a consensus under the self-persuasion effect or polarization under the liberating effect, demonstrating the feasibility and applicability of the proposal.
{"title":"Analysis of Expressed and Private Opinions (EPOs) models: Improving self-cognitive dissonance and releasing cumulative pressure in group decision-making systems","authors":"Jianglin Dong, Yiyi Zhao, Haixia Mao, Ya Yin, Jiangping Hu","doi":"10.1016/j.inffus.2024.102881","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102881","url":null,"abstract":"For group decision-making problems, the existing expressed and private opinions (EPOs) models focus on analyzing the limiting discrepancy between agents’ EPOs and the disagreement among agents’ private opinions under social pressure. However, they failed to consider the self-cognitive dissonance phenomenon arising from the discrepancy between agents’ EPOs or agents’ mismatched opinions and behaviors, as well as the impact of the cumulative pressure. This study proposes a novel EPOs model that updates private opinions by inferring the private opinions of social neighbors from their explicit behaviors, whereas expressed opinions updated by minimizing current social pressure. The proposed prevention and remedy mechanisms effectively address agents’ self-cognitive dissonance from different psychological perspectives. Additionally, to realize the release of the cumulative pressure, two threshold models grounded in the concepts of the self-persuasion and liberating effects in psychology are presented. The simulation results indicate that the proposed EPOs model effectively avoids the self-cognitive dissonance in a real social network. Finally, after the release of the cumulative pressure, the group EPOs will achieve a consensus under the self-persuasion effect or polarization under the liberating effect, demonstrating the feasibility and applicability of the proposal.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"26 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1016/j.inffus.2024.102878
Muyu Li, Yingfeng Wang, Henan Hu, Xudong Zhao
Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.
{"title":"InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation","authors":"Muyu Li, Yingfeng Wang, Henan Hu, Xudong Zhao","doi":"10.1016/j.inffus.2024.102878","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102878","url":null,"abstract":"Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"202 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-19DOI: 10.1016/j.inffus.2024.102884
Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou
RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: https://github.com/zsy950116/CDF-DSR.
{"title":"CDF-DSR: Learning continuous depth field for self-supervised RGB-guided depth map super resolution","authors":"Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou","doi":"10.1016/j.inffus.2024.102884","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102884","url":null,"abstract":"RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: <ce:inter-ref xlink:href=\"https://github.com/zsy950116/CDF-DSR\" xlink:type=\"simple\">https://github.com/zsy950116/CDF-DSR</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"359 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.
{"title":"ADF-OCT: An advanced Assistive Diagnosis Framework for study-level macular optical coherence tomography","authors":"Weihao Gao, Wangting Li, Dong Fang, Zheng Gong, Chucheng Chen, Zhuo Deng, Fuju Rong, Lu Chen, Lujia Feng, Canfeng Huang, Jia Liang, Yijing Zhuang, Pengxue Wei, Ting Xie, Zhiyuan Niu, Fang Li, Xianling Tang, Bing Zhang, Zixia Zhou, Shaochong Zhang, Lan Ma","doi":"10.1016/j.inffus.2024.102877","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102877","url":null,"abstract":"Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"11 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-12DOI: 10.1016/j.inffus.2024.102869
Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du
Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose Diff-PC, a diffusion-based framework for zero-shot PC, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.
肖像定制(PC)因其潜在的应用前景而受到广泛关注。然而,现有的 PC 方法缺乏精确的身份(ID)保存和人脸控制。为了解决这些问题,我们提出了 Diff-PC,这是一种基于扩散的零镜头 PC 框架,可生成具有高 ID 保真度、指定面部属性和多样化背景的逼真肖像。具体来说,我们的方法采用三维人脸预测器来重建三维感知的面部先验,其中包括参考 ID、目标表情和姿势。为了捕捉细粒度的面部细节,我们设计了融合局部和全局面部特征的 ID 编码器。随后,我们设计了 ID-Ctrl,利用三维人脸来指导 ID 特征的对齐。我们进一步引入了 ID 注入器,以增强 ID 的保真度和面部可控性。最后,在我们收集的以 ID 为中心的数据集上进行训练,提高了人脸相似度和文本到图像(T2I)的对齐度。广泛的实验证明,Diff-PC 在 ID 保存、面部控制和 T2I 一致性方面超越了最先进的方法。值得注意的是,在所有数据集上,人脸相似度都提高了约 +3%。此外,我们的方法与多风格基础模型兼容。
{"title":"Diff-PC: Identity-preserving and 3D-aware controllable diffusion for zero-shot portrait customization","authors":"Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du","doi":"10.1016/j.inffus.2024.102869","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102869","url":null,"abstract":"Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose <ce:bold>Diff-PC</ce:bold>, a <ce:bold>diff</ce:bold>usion-based framework for zero-shot <ce:bold>PC</ce:bold>, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"22 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}