Information Fusion最新文献_第3页

Text-guided multimodal depression detection via cross-modal feature reconstruction and decomposition 基于跨模态特征重构和分解的文本引导多模态凹陷检测

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-22 DOI: 10.1016/j.inffus.2024.102861

Ziqiang Chen, Dandan Wang, Liangliang Lou, Shiqing Zhang, Xiaoming Zhao, Shuqiang Jiang, Jun Yu, Jun Xiao

Depression, a widespread and debilitating mental health disorder, requires early detection to facilitate effective intervention. Automated depression detection integrating audio with text modalities is a challenging yet significant issue due to the information redundancy and inter-modal heterogeneity across modalities. Prior works usually fail to fully learn the interaction of audio–text modalities for depression detection in an explicit manner. To address these issues, this work proposes a novel text-guided multimdoal depression detection method based on a cross-modal feature reconstruction and decomposition framework. The proposed method takes the text modality as the core modality to guide the model to reconstruct comprehensive audio features for cross-modal feature decomposition tasks. Moreover, the designed cross-modal feature reconstruction and decomposition framework aims to disentangle the shared and private features from the text-guided reconstructed comprehensive audio features for subsequent multimodal fusion. Besides, a bi-directional cross-attention module is designed to interactively learn simultaneous and mutual correlations across modalities for feature enhancement. Extensive experiments are performed on the DAIC-WoZ and E-DAIC datasets, and the results show the superiority of the proposed method on multimodal depression detection tasks, outperforming the state-of-the-arts.

抑郁症是一种广泛存在的使人衰弱的精神健康障碍，需要及早发现，以便进行有效干预。由于音频和文本模态的信息冗余和模态间的异质性，集成音频和文本模态的自动抑郁检测是一个具有挑战性的重要问题。以往的研究通常不能充分了解音频-文本模式在抑郁症检测中的相互作用。为了解决这些问题，本文提出了一种基于跨模态特征重构和分解框架的文本引导多模态凹陷检测方法。该方法以文本模态为核心模态，指导模型重构综合音频特征，完成跨模态特征分解任务。此外，所设计的跨模态特征重构与分解框架旨在从文本引导下重构的综合音频特征中分离出共享特征和私有特征，用于后续的多模态融合。设计了双向交叉注意模块，以交互方式学习各模态之间同时存在的相互关联，实现特征增强。在DAIC-WoZ和E-DAIC数据集上进行了大量的实验，结果表明该方法在多模态凹陷检测任务上具有优越性，优于目前的方法。

{"title":"Text-guided multimodal depression detection via cross-modal feature reconstruction and decomposition","authors":"Ziqiang Chen, Dandan Wang, Liangliang Lou, Shiqing Zhang, Xiaoming Zhao, Shuqiang Jiang, Jun Yu, Jun Xiao","doi":"10.1016/j.inffus.2024.102861","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102861","url":null,"abstract":"Depression, a widespread and debilitating mental health disorder, requires early detection to facilitate effective intervention. Automated depression detection integrating audio with text modalities is a challenging yet significant issue due to the information redundancy and inter-modal heterogeneity across modalities. Prior works usually fail to fully learn the interaction of audio–text modalities for depression detection in an explicit manner. To address these issues, this work proposes a novel text-guided multimdoal depression detection method based on a cross-modal feature reconstruction and decomposition framework. The proposed method takes the text modality as the core modality to guide the model to reconstruct comprehensive audio features for cross-modal feature decomposition tasks. Moreover, the designed cross-modal feature reconstruction and decomposition framework aims to disentangle the shared and private features from the text-guided reconstructed comprehensive audio features for subsequent multimodal fusion. Besides, a bi-directional cross-attention module is designed to interactively learn simultaneous and mutual correlations across modalities for feature enhancement. Extensive experiments are performed on the DAIC-WoZ and E-DAIC datasets, and the results show the superiority of the proposed method on multimodal depression detection tasks, outperforming the state-of-the-arts.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"65 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ORC-GNN: A novel open set recognition based on graph neural network for multi-class classification of psychiatric disorders ORC-GNN：一种基于图神经网络的开放集识别方法，用于精神疾病的多类分类

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-21 DOI: 10.1016/j.inffus.2024.102887

Yaqin Li, Yihong Dong, Shoubo Peng, Linlin Gao, Yu Xin

Open-set recognition (OSR) refers to the challenge of introducing classes not seen during model training into the test set. This issue is particularly critical in the medical field due to incomplete data collection and the continuous emergence of new and rare diseases. Medical OSR techniques necessitate not only the accurate classification of known cases but also the ability to detect unknown cases and send the corresponding information to experts for further diagnosis. However, there is a significant research gap in the current medical OSR field, which not only lacks research methods for OSR in psychiatric disorders, but also lacks detailed procedures for OSR evaluation based on neuroimaging. To address the challenges associated with the OSR of psychiatric disorders, we propose a method named the open-set risk collaborative consistency graph neural network (ORC-GNN). First, functional connectivity (FC) is used to extract measurable representations in the deep feature space by coordinating hemispheric and whole-brain networks, thereby achieving multi-level brain network feature fusion and regional communication. Subsequently, these representations are used to guide the model to adaptively learn the decision boundaries for known classes using the instance-level density awareness and to identify samples outside these boundaries as unknown. We introduce a novel open-risk margin loss (ORML) to balance empirical risk and open-space risk; this approach makes open-space risk quantifiable through the introduction of open-risk term. We evaluate our method using an integrated multi-class dataset and a tailored experimental protocol suited for psychiatric disorder-related OSR challenges. Compared to state-of-the-art techniques, ORC-GNN demonstrates significant performance improvements and yields important clinically interpretative information regarding the shared and distinct characteristics of multiple psychiatric disorders.

开放集识别（OSR）是指将模型训练过程中未见的类引入测试集的挑战。由于数据收集不完整以及新的和罕见疾病的不断出现，这一问题在医疗领域尤为重要。医疗OSR技术不仅需要对已知病例进行准确分类，而且需要能够发现未知病例并将相应信息发送给专家以进行进一步诊断。然而，目前医学OSR领域的研究存在明显的空白，不仅缺乏精神障碍OSR的研究方法，而且缺乏基于神经影像学的OSR评估的详细程序。为了解决与精神疾病OSR相关的挑战，我们提出了一种名为开放集风险协同一致性图神经网络（ORC-GNN）的方法。首先，利用功能连通性（FC）协调半球和全脑网络，在深层特征空间中提取可测量表征，从而实现多层次的脑网络特征融合和区域通信。随后，使用这些表示来指导模型使用实例级密度感知自适应学习已知类的决策边界，并将这些边界之外的样本识别为未知。我们引入了一种新的开放风险保证金损失（ORML）来平衡经验风险和开放空间风险；该方法通过引入开放风险术语使开放空间风险可量化。我们使用集成的多类数据集和适合精神障碍相关OSR挑战的定制实验方案来评估我们的方法。与最先进的技术相比，ORC-GNN显示出显著的性能改进，并产生关于多种精神疾病共同和独特特征的重要临床解释性信息。

{"title":"ORC-GNN: A novel open set recognition based on graph neural network for multi-class classification of psychiatric disorders","authors":"Yaqin Li, Yihong Dong, Shoubo Peng, Linlin Gao, Yu Xin","doi":"10.1016/j.inffus.2024.102887","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102887","url":null,"abstract":"Open-set recognition (OSR) refers to the challenge of introducing classes not seen during model training into the test set. This issue is particularly critical in the medical field due to incomplete data collection and the continuous emergence of new and rare diseases. Medical OSR techniques necessitate not only the accurate classification of known cases but also the ability to detect unknown cases and send the corresponding information to experts for further diagnosis. However, there is a significant research gap in the current medical OSR field, which not only lacks research methods for OSR in psychiatric disorders, but also lacks detailed procedures for OSR evaluation based on neuroimaging. To address the challenges associated with the OSR of psychiatric disorders, we propose a method named the open-set risk collaborative consistency graph neural network (ORC-GNN). First, functional connectivity (FC) is used to extract measurable representations in the deep feature space by coordinating hemispheric and whole-brain networks, thereby achieving multi-level brain network feature fusion and regional communication. Subsequently, these representations are used to guide the model to adaptively learn the decision boundaries for known classes using the instance-level density awareness and to identify samples outside these boundaries as unknown. We introduce a novel open-risk margin loss (ORML) to balance empirical risk and open-space risk; this approach makes open-space risk quantifiable through the introduction of open-risk term. We evaluate our method using an integrated multi-class dataset and a tailored experimental protocol suited for psychiatric disorder-related OSR challenges. Compared to state-of-the-art techniques, ORC-GNN demonstrates significant performance improvements and yields important clinically interpretative information regarding the shared and distinct characteristics of multiple psychiatric disorders.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"33 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced detection of early Parkinson’ s disease through multi-sensor fusion on smartphone-based IoMT platforms 基于智能手机的IoMT平台上多传感器融合增强早期帕金森病的检测

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-21 DOI: 10.1016/j.inffus.2024.102889

Tongyue He, Junxin Chen, M. Shamim Hossain, Zhihan Lyu

To date, Parkinson’s disease (PD) is an incurable neurological disorder, and the time of quality life can only be extended through early detection and timely intervention. However, the symptoms of early PD are both heterogeneous and subtle. To cope with these challenges, we develop a two-level fusion framework for smart healthcare, leveraging smartphones interconnected with the Internet of Medical Things and exploring the contribution of multi-sensor and multi-activity data. Rotation rate and acceleration during walking activity are recorded with the gyroscope and accelerometer, while location coordinates and acceleration during tapping activity are collected via the touch screen and accelerometer, and voice signals are captured by the microphone. The main scientific contribution is the enhanced fusion of multi-sensor information to cope with the heterogeneous and subtle nature of early PD symptoms, achieved by a first-level component that fuses features within a single activity using an attention mechanism and a second-level component that dynamically allocates weights across activities. Compared with related works, the proposed framework explores the potential of fusing multi-sensor data within a single activity, and mines the importance of different activities that correspond to early PD symptoms. The proposed two-level fusion framework achieves an AUC of 0.891 (95 % CI, 0.860–0.921) and a sensitivity of 0.950 (95 % CI, 0.888–1.000) in early PD detection, demonstrating that it efficiently fuses information from different sensor data for various activities and has a strong fault tolerance for data.

迄今为止，帕金森病（PD）是一种无法治愈的神经系统疾病，只有通过早期发现和及时干预才能延长高质量生活的时间。然而，早期PD的症状既多样又微妙。为了应对这些挑战，我们开发了智能医疗的两级融合框架，利用与医疗物联网相连的智能手机，并探索多传感器和多活动数据的贡献。通过陀螺仪和加速度计记录行走活动时的旋转速率和加速度，通过触摸屏和加速度计收集敲击活动时的位置坐标和加速度，并通过麦克风捕获语音信号。主要的科学贡献是增强了多传感器信息的融合，以应对早期PD症状的异质性和微妙性，通过使用注意机制融合单个活动中的特征的第一级组件和跨活动动态分配权重的第二级组件实现。与相关工作相比，该框架探索了在单一活动中融合多传感器数据的潜力，并挖掘了与早期PD症状相对应的不同活动的重要性。所提出的两级融合框架在PD早期检测中的AUC为0.891 (95% CI, 0.860-0.921)，灵敏度为0.950 (95% CI, 0.888-1.000)，表明该框架能够有效融合不同传感器数据中各种活动的信息，对数据具有较强的容错性。

{"title":"Enhanced detection of early Parkinson’ s disease through multi-sensor fusion on smartphone-based IoMT platforms","authors":"Tongyue He, Junxin Chen, M. Shamim Hossain, Zhihan Lyu","doi":"10.1016/j.inffus.2024.102889","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102889","url":null,"abstract":"To date, Parkinson’s disease (PD) is an incurable neurological disorder, and the time of quality life can only be extended through early detection and timely intervention. However, the symptoms of early PD are both heterogeneous and subtle. To cope with these challenges, we develop a two-level fusion framework for smart healthcare, leveraging smartphones interconnected with the Internet of Medical Things and exploring the contribution of multi-sensor and multi-activity data. Rotation rate and acceleration during walking activity are recorded with the gyroscope and accelerometer, while location coordinates and acceleration during tapping activity are collected via the touch screen and accelerometer, and voice signals are captured by the microphone. The main scientific contribution is the enhanced fusion of multi-sensor information to cope with the heterogeneous and subtle nature of early PD symptoms, achieved by a first-level component that fuses features within a single activity using an attention mechanism and a second-level component that dynamically allocates weights across activities. Compared with related works, the proposed framework explores the potential of fusing multi-sensor data within a single activity, and mines the importance of different activities that correspond to early PD symptoms. The proposed two-level fusion framework achieves an AUC of 0.891 (95 % CI, 0.860–0.921) and a sensitivity of 0.950 (95 % CI, 0.888–1.000) in early PD detection, demonstrating that it efficiently fuses information from different sensor data for various activities and has a strong fault tolerance for data.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"166 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142902106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical disturbance and Group Inference for video-based visible-infrared person re-identification 基于视频的可见-红外人物再识别的层次扰动与群体推理

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-21 DOI: 10.1016/j.inffus.2024.102882

Chuhao Zhou, Yuzhe Zhou, Tingting Ren, Huafeng Li, Jinxing Li, Guangming Lu

Video-based Visible-Infrared person Re-identification (VVI-ReID) is challenging due to the large inter-view and inter-modal discrepancies. To alleviate these discrepancies, most existing works only focus on whole images, while more id-related partial information is ignored. Furthermore, the inference decision is commonly based on the similarity of two samples. However, the semantic gap between the query and gallery samples inevitably exists due to their inter-view misalignment, no matter whether the modality-gap is removed. In this paper, we proposed a Hierarchical Disturbance (HD) and Group Inference (GI) method to handle aforementioned issues. Specifically, the HD module models the inter-view and inter-modal discrepancies as multiple image styles, and conducts feature disturbances through partially transferring body styles. By hierarchically taking the partial and global features into account, our model is capable of adaptively achieving invariant but identity-related features. Additionally, instead of establishing similarity between the query sample and each gallery sample independently, the GI module is further introduced to extract complementary information from all potential intra-class gallery samples of the given query sample, which boosts the performance on matching hard samples. Extensive experiments substantiate the superiority of our method compared with state-of-the arts.

基于视频的可见-红外人员再识别（VVI-ReID）由于视场间和模态间的巨大差异而具有挑战性。为了缓解这些差异，大多数现有作品只关注整体图像，而忽略了更多与id相关的部分信息。此外，推理决策通常基于两个样本的相似性。然而，无论是否去除模态差距，查询和图库样本之间的语义差距都不可避免地存在，因为它们的视图间不对齐。本文提出了一种层次干扰（HD）和群体推理（GI）方法来处理上述问题。具体来说，高清模块将跨视图和跨模态差异建模为多个图像样式，并通过部分转移主体样式进行特征干扰。通过分层考虑局部和全局特征，我们的模型能够自适应地获得不变但与身份相关的特征。此外，不再独立地建立查询样本和每个图库样本之间的相似性，而是进一步引入GI模块，从给定查询样本的所有潜在类内图库样本中提取互补信息，从而提高了硬样本匹配的性能。大量的实验证明我们的方法比最先进的方法优越。

{"title":"Hierarchical disturbance and Group Inference for video-based visible-infrared person re-identification","authors":"Chuhao Zhou, Yuzhe Zhou, Tingting Ren, Huafeng Li, Jinxing Li, Guangming Lu","doi":"10.1016/j.inffus.2024.102882","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102882","url":null,"abstract":"Video-based Visible-Infrared person Re-identification (VVI-ReID) is challenging due to the large inter-view and inter-modal discrepancies. To alleviate these discrepancies, most existing works only focus on whole images, while more id-related partial information is ignored. Furthermore, the inference decision is commonly based on the similarity of two samples. However, the semantic gap between the query and gallery samples inevitably exists due to their inter-view misalignment, no matter whether the modality-gap is removed. In this paper, we proposed a Hierarchical Disturbance (HD) and Group Inference (GI) method to handle aforementioned issues. Specifically, the HD module models the inter-view and inter-modal discrepancies as multiple image styles, and conducts feature disturbances through partially transferring body styles. By hierarchically taking the partial and global features into account, our model is capable of adaptively achieving invariant but identity-related features. Additionally, instead of establishing similarity between the query sample and each gallery sample independently, the GI module is further introduced to extract complementary information from all potential intra-class gallery samples of the given query sample, which boosts the performance on matching hard samples. Extensive experiments substantiate the superiority of our method compared with state-of-the arts.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"58 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FLEX: Flexible Federated Learning Framework FLEX：灵活的联邦学习框架

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-20 DOI: 10.1016/j.inffus.2024.102792

F. Herrera, D. Jiménez-López, A. Argente-Garrido, N. Rodríguez-Barroso, C. Zuheros, I. Aguilera-Martos, B. Bello, M. García-Márquez, M.V. Luzón

In the realm of Artificial Intelligence (AI), the need for privacy and security in data processing has become paramount. As AI applications continue to expand, the collection and handling of sensitive data raise concerns about individual privacy protection. Federated Learning (FL) emerges as a promising solution to address these challenges by enabling decentralized model training on local devices, thus preserving data privacy. This paper introduces FLEX: a FLEXible Federated Learning Framework designed to provide maximum flexibility in FL research experiments and the possibility to deploy federated solutions. By offering customizable features for data distribution, privacy parameters, and communication strategies, FLEX empowers researchers to innovate and develop novel FL techniques. It also provides a distributed version that allows experiments to be deployed on different devices. The framework also includes libraries for specific FL implementations including: (1) anomalies, (2) blockchain, (3) adversarial attacks and defenses, (4) natural language processing and (5) decision trees, enhancing its versatility and applicability in various domains. Overall, FLEX represents a significant advancement in FL research and deployment, facilitating the development of robust and efficient FL applications.

在人工智能（AI）领域，对数据处理中的隐私和安全的需求已经变得至关重要。随着人工智能应用的不断扩展，敏感数据的收集和处理引发了对个人隐私保护的担忧。通过在本地设备上进行分散的模型训练，从而保护数据隐私，联邦学习（FL）成为解决这些挑战的一个有前途的解决方案。本文介绍了FLEX：一个灵活的联邦学习框架，旨在为FL研究实验提供最大的灵活性和部署联邦解决方案的可能性。通过为数据分布、隐私参数和通信策略提供可定制的功能，FLEX使研究人员能够创新和开发新颖的FL技术。它还提供了一个分布式版本，允许在不同的设备上部署实验。该框架还包括用于特定FL实现的库，包括：(1)异常，(2)区块链，(3)对抗性攻击和防御，(4)自然语言处理和(5)决策树，增强了其在各个领域的通用性和适用性。总的来说，FLEX代表了FL研究和部署的重大进步，促进了强大而高效的FL应用程序的开发。

{"title":"FLEX: Flexible Federated Learning Framework","authors":"F. Herrera, D. Jiménez-López, A. Argente-Garrido, N. Rodríguez-Barroso, C. Zuheros, I. Aguilera-Martos, B. Bello, M. García-Márquez, M.V. Luzón","doi":"10.1016/j.inffus.2024.102792","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102792","url":null,"abstract":"In the realm of Artificial Intelligence (AI), the need for privacy and security in data processing has become paramount. As AI applications continue to expand, the collection and handling of sensitive data raise concerns about individual privacy protection. Federated Learning (FL) emerges as a promising solution to address these challenges by enabling decentralized model training on local devices, thus preserving data privacy. This paper introduces FLEX: a FLEXible Federated Learning Framework designed to provide maximum flexibility in FL research experiments and the possibility to deploy federated solutions. By offering customizable features for data distribution, privacy parameters, and communication strategies, FLEX empowers researchers to innovate and develop novel FL techniques. It also provides a distributed version that allows experiments to be deployed on different devices. The framework also includes libraries for specific FL implementations including: (1) anomalies, (2) blockchain, (3) adversarial attacks and defenses, (4) natural language processing and (5) decision trees, enhancing its versatility and applicability in various domains. Overall, FLEX represents a significant advancement in FL research and deployment, facilitating the development of robust and efficient FL applications.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"50 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of Expressed and Private Opinions (EPOs) models: Improving self-cognitive dissonance and releasing cumulative pressure in group decision-making systems 表达意见和私人意见模型分析：改善群体决策系统中的自我认知失调，释放累积压力

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-20 DOI: 10.1016/j.inffus.2024.102881

Jianglin Dong, Yiyi Zhao, Haixia Mao, Ya Yin, Jiangping Hu

For group decision-making problems, the existing expressed and private opinions (EPOs) models focus on analyzing the limiting discrepancy between agents’ EPOs and the disagreement among agents’ private opinions under social pressure. However, they failed to consider the self-cognitive dissonance phenomenon arising from the discrepancy between agents’ EPOs or agents’ mismatched opinions and behaviors, as well as the impact of the cumulative pressure. This study proposes a novel EPOs model that updates private opinions by inferring the private opinions of social neighbors from their explicit behaviors, whereas expressed opinions updated by minimizing current social pressure. The proposed prevention and remedy mechanisms effectively address agents’ self-cognitive dissonance from different psychological perspectives. Additionally, to realize the release of the cumulative pressure, two threshold models grounded in the concepts of the self-persuasion and liberating effects in psychology are presented. The simulation results indicate that the proposed EPOs model effectively avoids the self-cognitive dissonance in a real social network. Finally, after the release of the cumulative pressure, the group EPOs will achieve a consensus under the self-persuasion effect or polarization under the liberating effect, demonstrating the feasibility and applicability of the proposal.

对于群体决策问题，现有的表达意见和私人意见模型侧重于分析社会压力下代理人的表达意见和私人意见之间的限制差异。然而，他们没有考虑到由于代理人的epo差异或代理人的意见与行为不匹配而产生的自我认知失调现象，以及累积压力的影响。本研究提出了一种新的epo模型，该模型通过从社会邻居的外显行为推断其私人意见来更新私人意见，而通过最小化当前社会压力来更新表达意见。提出的预防和补救机制从不同的心理学角度有效地解决了行为人的自我认知失调。此外，为了实现累积压力的释放，本文提出了基于心理学中自我说服效应和解放效应的两种阈值模型。仿真结果表明，该模型有效地避免了现实社会网络中的自我认知失调。最后，在累积压力释放后，群体epo会在自我说服效应下达成共识，或者在解放效应下出现两极分化，证明提案的可行性和适用性。

{"title":"Analysis of Expressed and Private Opinions (EPOs) models: Improving self-cognitive dissonance and releasing cumulative pressure in group decision-making systems","authors":"Jianglin Dong, Yiyi Zhao, Haixia Mao, Ya Yin, Jiangping Hu","doi":"10.1016/j.inffus.2024.102881","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102881","url":null,"abstract":"For group decision-making problems, the existing expressed and private opinions (EPOs) models focus on analyzing the limiting discrepancy between agents’ EPOs and the disagreement among agents’ private opinions under social pressure. However, they failed to consider the self-cognitive dissonance phenomenon arising from the discrepancy between agents’ EPOs or agents’ mismatched opinions and behaviors, as well as the impact of the cumulative pressure. This study proposes a novel EPOs model that updates private opinions by inferring the private opinions of social neighbors from their explicit behaviors, whereas expressed opinions updated by minimizing current social pressure. The proposed prevention and remedy mechanisms effectively address agents’ self-cognitive dissonance from different psychological perspectives. Additionally, to realize the release of the cumulative pressure, two threshold models grounded in the concepts of the self-persuasion and liberating effects in psychology are presented. The simulation results indicate that the proposed EPOs model effectively avoids the self-cognitive dissonance in a real social network. Finally, after the release of the cumulative pressure, the group EPOs will achieve a consensus under the self-persuasion effect or polarization under the liberating effect, demonstrating the feasibility and applicability of the proposal.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"26 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation 拥挤人群姿态估计的分层结构融合变压器

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-20 DOI: 10.1016/j.inffus.2024.102878

Muyu Li, Yingfeng Wang, Henan Hu, Xudong Zhao

Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.

在拥挤的场景中，由于频繁的遮挡和个体之间复杂的相互作用，人体姿势估计提出了独特的挑战。为了解决这些问题，我们引入了一个分层结构融合变压器，旨在改善拥挤的人体姿态估计。intertrans使用分层的关节-肢体-语义融合模块将语义特征集成到结构信息中。通过将关节和肢体重新组织成树状结构，融合模块促进了不同结构级别之间的有效信息交换，并利用了全局结构信息和局部上下文细节。此外，我们明确地将肢体结构模式与关节分开建模，将肢体作为具有定义长度和方向的向量。这使得我们的模型可以从最小的输入中推断出完整的人体姿势，大大增强了姿势优化任务。在多个数据集上进行的大量实验表明，在拥挤和闭塞的情况下，intertrans优于现有的姿态估计技术。提出的intertrans作为一种鲁棒的后处理技术，能够在具有挑战性的环境中提高姿态估计的准确性和鲁棒性。

{"title":"InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation","authors":"Muyu Li, Yingfeng Wang, Henan Hu, Xudong Zhao","doi":"10.1016/j.inffus.2024.102878","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102878","url":null,"abstract":"Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"202 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CDF-DSR: Learning continuous depth field for self-supervised RGB-guided depth map super resolution CDF-DSR：学习连续深度场的自监督rgb引导深度图超分辨率

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-19 DOI: 10.1016/j.inffus.2024.102884

Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou

RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: https://github.com/zsy950116/CDF-DSR.

RGB引导深度图超分辨率（GDSR）是一项关键的多模态融合任务，旨在利用相应的高分辨率（HR） RGB图像作为引导增强低分辨率（LR）深度图。现有的方法主要依赖于有监督的深度学习技术，由于收集各种RGB-D数据集的挑战，这些技术往往受到泛化能力有限的阻碍。为了解决这个问题，我们引入了一种新的自监督范式，该范式仅利用单个RGB-D样本实现深度图超分辨率，而无需任何额外的训练数据。考虑到场景深度通常是连续的，本文提出的方法将GDSR任务定义为为每个RGB-D样本重建一个连续的深度场。深度场表示为基于神经网络的从图像坐标到深度值的映射，并通过利用可用的HR RGB图像和LR深度图进行优化。同时，提出了一种新的跨模态几何一致性损失来提高深度场的细节精度。跨多个数据集的实验结果表明，与目前最先进的GDSR方法相比，该方法具有更好的泛化能力，在实际应用中表现出显著的性能。测试代码可从https://github.com/zsy950116/CDF-DSR获得。

{"title":"CDF-DSR: Learning continuous depth field for self-supervised RGB-guided depth map super resolution","authors":"Siyuan Zhang, Jingxian Dong, Yan Ma, Hongsen Cai, Meijie Wang, Yan Li, Twaha B. Kabika, Xin Li, Wenguang Hou","doi":"10.1016/j.inffus.2024.102884","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102884","url":null,"abstract":"RGB-guided depth map super-resolution (GDSR) is a pivotal multimodal fusion task aimed at enhancing low-resolution (LR) depth maps using corresponding high-resolution (HR) RGB images as guidance. Existing approaches largely rely on supervised deep learning techniques, which are often hampered by limited generalization capabilities due to the challenges in collecting varied RGB-D datasets. To address this, we introduce a novel self-supervised paradigm that achieves depth map super-resolution utilizing just a single RGB-D sample, without any additional training data. Considering that scene depths are typically continuous, the proposed method conceptualizes the GDSR task as reconstructing a continuous depth field for each RGB-D sample. The depth field is represented as a neural network-based mapping from image coordinates to depth values, and optimized by leveraging the available HR RGB image and the LR depth map. Meanwhile, a novel cross-modal geometric consistency loss is proposed to enhance the detail accuracy of the depth field. Experimental results across multiple datasets demonstrate that the proposed method offers superior generalization compared to state-of-the-art GDSR methods and shows remarkable performance in practical applications. The test code is available at: <ce:inter-ref xlink:href=\"https://github.com/zsy950116/CDF-DSR\" xlink:type=\"simple\">https://github.com/zsy950116/CDF-DSR</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"359 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ADF-OCT: An advanced Assistive Diagnosis Framework for study-level macular optical coherence tomography ADF-OCT：研究级黄斑光学相干断层扫描的先进辅助诊断框架

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-18 DOI: 10.1016/j.inffus.2024.102877

Weihao Gao, Wangting Li, Dong Fang, Zheng Gong, Chucheng Chen, Zhuo Deng, Fuju Rong, Lu Chen, Lujia Feng, Canfeng Huang, Jia Liang, Yijing Zhuang, Pengxue Wei, Ting Xie, Zhiyuan Niu, Fang Li, Xianling Tang, Bing Zhang, Zixia Zhou, Shaochong Zhang, Lan Ma

Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.

光学相干断层扫描（OCT）是一种先进的视网膜成像技术，可以实现视网膜的无创横切面可视化，在眼科中检测各种黄斑病变起着至关重要的作用。虽然深度学习在OCT图像分析中显示出前景，但现有的研究主要集中在广泛的图像级疾病诊断上。本研究引入了OCT辅助诊断框架（ADF-OCT），该框架利用100多万黄斑OCT图像数据集构建了常见黄斑病变的多标签诊断模型和医疗报告生成模块。我们创新的多帧医学图像蒸馏方法有效地将研究级多标签注释转换为图像级注释，从而提高诊断性能，而无需额外的注释信息。该方法显著提高了多标签分类的诊断准确率，AUROC达到了令人印象深刻的0.9891，最佳性能宏F1为0.8533，准确率为0.9411。通过改进多帧医学成像中的特征融合策略，我们的框架大大增强了OCT b扫描医学报告的生成，超越了当前的解决方案。本研究提出了一个先进的开发管道，利用现有的临床数据集，为黄斑OCT提供更准确和全面的人工智能辅助诊断。

{"title":"ADF-OCT: An advanced Assistive Diagnosis Framework for study-level macular optical coherence tomography","authors":"Weihao Gao, Wangting Li, Dong Fang, Zheng Gong, Chucheng Chen, Zhuo Deng, Fuju Rong, Lu Chen, Lujia Feng, Canfeng Huang, Jia Liang, Yijing Zhuang, Pengxue Wei, Ting Xie, Zhiyuan Niu, Fang Li, Xianling Tang, Bing Zhang, Zixia Zhou, Shaochong Zhang, Lan Ma","doi":"10.1016/j.inffus.2024.102877","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102877","url":null,"abstract":"Optical coherence tomography (OCT) is an advanced retinal imaging technique that enables non-invasive cross-sectional visualization of the retina, playing a crucial role in ophthalmology for detecting various macular lesions. While deep learning has shown promise in OCT image analysis, existing studies have primarily focused on broad, image-level disease diagnosis. This study introduces the Assistive Diagnosis Framework for OCT (ADF-OCT), which utilizes a dataset of over one million macular OCT images to construct a multi-label diagnostic model for common macular lesions and a medical report generation module. Our innovative Multi-frame Medical Images Distillation method effectively translates study-level multi-label annotations into image-level annotations, thereby enhancing diagnostic performance without additional annotation information. This approach significantly improves diagnostic accuracy for multi-label classification, achieving an impressive AUROC of 0.9891 with best performance macro F1 of 0.8533 and accuracy of 0.9411. By refining the feature fusion strategy in multi-frame medical imaging, our framework substantially enhances the generation of medical reports for OCT B-scans, surpassing current solutions. This research presents an advanced development pipeline that utilizes existing clinical datasets to provide more accurate and comprehensive artificial intelligence-assisted diagnoses for macular OCT.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"11 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diff-PC: Identity-preserving and 3D-aware controllable diffusion for zero-shot portrait customization Diff-PC：身份保持和3d感知可控扩散零拍摄肖像定制

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion

Pub Date : 2024-12-12 DOI: 10.1016/j.inffus.2024.102869

Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du

Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose Diff-PC, a diffusion-based framework for zero-shot PC, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.

肖像定制（PC）因其潜在的应用前景而受到广泛关注。然而，现有的 PC 方法缺乏精确的身份（ID）保存和人脸控制。为了解决这些问题，我们提出了 Diff-PC，这是一种基于扩散的零镜头 PC 框架，可生成具有高 ID 保真度、指定面部属性和多样化背景的逼真肖像。具体来说，我们的方法采用三维人脸预测器来重建三维感知的面部先验，其中包括参考 ID、目标表情和姿势。为了捕捉细粒度的面部细节，我们设计了融合局部和全局面部特征的 ID 编码器。随后，我们设计了 ID-Ctrl，利用三维人脸来指导 ID 特征的对齐。我们进一步引入了 ID 注入器，以增强 ID 的保真度和面部可控性。最后，在我们收集的以 ID 为中心的数据集上进行训练，提高了人脸相似度和文本到图像（T2I）的对齐度。广泛的实验证明，Diff-PC 在 ID 保存、面部控制和 T2I 一致性方面超越了最先进的方法。值得注意的是，在所有数据集上，人脸相似度都提高了约 +3%。此外，我们的方法与多风格基础模型兼容。

{"title":"Diff-PC: Identity-preserving and 3D-aware controllable diffusion for zero-shot portrait customization","authors":"Yifang Xu, Benxiang Zhai, Chenyu Zhang, Ming Li, Yang Li, Sidan Du","doi":"10.1016/j.inffus.2024.102869","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102869","url":null,"abstract":"Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face control. To address these tissues, we propose <ce:bold>Diff-PC</ce:bold>, a <ce:bold>diff</ce:bold>usion-based framework for zero-shot <ce:bold>PC</ce:bold>, which generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds. Specifically, our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors encompassing the reference ID, target expressions, and poses. To capture fine-grained face details, we design ID-Encoder that fuses local and global face features. Subsequently, we devise ID-Ctrl using the 3D face to guide the alignment of ID features. We further introduce ID-Injector to enhance ID fidelity and facial controllability. Finally, training on our collected ID-centric dataset improves face similarity and text-to-image (T2I) alignment. Extensive experiments demonstrate that Diff-PC surpasses state-of-the-art methods in ID preservation, face control, and T2I consistency. Notably, the face similarity improves by about +3% on all datasets. Furthermore, our method is compatible with multi-style foundation models.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"22 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0