首页 > 最新文献

Information Fusion最新文献

英文 中文
Optimizing the environmental design and management of public green spaces: Analyzing urban infrastructure and long-term user experience with a focus on streetlight density in the city of Las Vegas, NV 优化公共绿地的环境设计和管理:分析城市基础设施和长期用户体验,重点关注内华达州拉斯维加斯市的路灯密度
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-18 DOI: 10.1016/j.inffus.2024.102914
Xiwei Shen, Jie Kong, Yang Song, Xinyi Wang, Grant Mosey
In Las Vegas and many other desert cities, the unique climatic conditions, marked by high daytime temperatures, naturally encourage residents to seek outdoor recreational activities during the cooler evening hours. However, the approach to streetlight management has been less than optimal, leading to inadequate illumination in public parks after dark. This lack of proper lighting compromises not only the safety but also the enjoyment opportunity of these spaces during the night, a time when they could offer a much-needed respite during summer heat. Recent scholarship has highlighted the deterrence of park usage due to poor design of the street lighting, pointing to a broader issue in urban planning that requires attention to adapt infrastructures to local climates for the benefit of public health and well-being. This study seeks to contribute to the existing scholarship on park lighting by utilizing diverse data sources and creating longitudinal measures to examine how population behaviors in urban parks vary over time in different locations. It seeks to explore the impact of park users’ demographics, particularly variations across race and income levels, and the density of street lighting on the nighttime usage of public green spaces by using the time fixed effect method. It aims to understand how demographic diversity among park users and the physical environment, specifically street lighting density, influences patterns of nighttime activities in public parks. Using this analysis, we develop an improved predictive model for determining the density of street lighting in public green spaces by comparing multiple types of machine learning models. This model will consider the demographic diversity of users and the observed patterns of nighttime usage, with the goal of enhancing accessibility, safety, and utilization of these spaces during nighttime hours. The significance of this research contributes to the broader objective of creating resilient, healthy, and inclusive cities that cater to the well-being of their residents.
在拉斯维加斯和许多其他沙漠城市,独特的气候条件,以白天的高温为特征,自然鼓励居民在凉爽的晚上寻求户外娱乐活动。然而,路灯管理的方法并不理想,导致天黑后公园的照明不足。缺乏适当的照明不仅影响了安全,也影响了这些空间在夜间的享受机会,而在夏季炎热的时候,它们可以提供急需的喘息机会。最近的学术研究强调了由于糟糕的街道照明设计而阻碍了公园的使用,这指出了城市规划中一个更广泛的问题,即需要注意使基础设施适应当地气候,以造福公众健康和福祉。本研究旨在通过利用不同的数据来源和创建纵向测量方法来研究城市公园中不同地点的人口行为如何随时间变化,从而为现有的公园照明研究做出贡献。它试图通过使用时间固定效应方法来探索公园用户的人口统计数据的影响,特别是种族和收入水平的差异,以及街道照明密度对夜间公共绿地使用的影响。它旨在了解公园使用者和自然环境的人口多样性,特别是街道照明密度,如何影响公园夜间活动的模式。利用这一分析,我们通过比较多种类型的机器学习模型,开发了一种改进的预测模型,用于确定公共绿地中街道照明的密度。该模型将考虑用户的人口多样性和观察到的夜间使用模式,目标是提高夜间空间的可达性、安全性和利用率。这项研究的重要性有助于实现更广泛的目标,即创建有弹性、健康和包容的城市,以满足居民的福祉。
{"title":"Optimizing the environmental design and management of public green spaces: Analyzing urban infrastructure and long-term user experience with a focus on streetlight density in the city of Las Vegas, NV","authors":"Xiwei Shen, Jie Kong, Yang Song, Xinyi Wang, Grant Mosey","doi":"10.1016/j.inffus.2024.102914","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102914","url":null,"abstract":"In Las Vegas and many other desert cities, the unique climatic conditions, marked by high daytime temperatures, naturally encourage residents to seek outdoor recreational activities during the cooler evening hours. However, the approach to streetlight management has been less than optimal, leading to inadequate illumination in public parks after dark. This lack of proper lighting compromises not only the safety but also the enjoyment opportunity of these spaces during the night, a time when they could offer a much-needed respite during summer heat. Recent scholarship has highlighted the deterrence of park usage due to poor design of the street lighting, pointing to a broader issue in urban planning that requires attention to adapt infrastructures to local climates for the benefit of public health and well-being. This study seeks to contribute to the existing scholarship on park lighting by utilizing diverse data sources and creating longitudinal measures to examine how population behaviors in urban parks vary over time in different locations. It seeks to explore the impact of park users’ demographics, particularly variations across race and income levels, and the density of street lighting on the nighttime usage of public green spaces by using the time fixed effect method. It aims to understand how demographic diversity among park users and the physical environment, specifically street lighting density, influences patterns of nighttime activities in public parks. Using this analysis, we develop an improved predictive model for determining the density of street lighting in public green spaces by comparing multiple types of machine learning models. This model will consider the demographic diversity of users and the observed patterns of nighttime usage, with the goal of enhancing accessibility, safety, and utilization of these spaces during nighttime hours. The significance of this research contributes to the broader objective of creating resilient, healthy, and inclusive cities that cater to the well-being of their residents.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"102 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DF-BSFNet: A bilateral synergistic fusion network with novel dynamic flow convolution for robust road extraction DF-BSFNet:一种具有新型动态流卷积的双边协同融合网络,用于鲁棒道路提取
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-15 DOI: 10.1016/j.inffus.2025.102958
Chong Zhang, Huazu Zhang, Xiaogang Guo, Heng Qi, Zilong Zhao, Luliang Tang
Accurate and robust road extraction with good continuity and completeness is crucial for the development of smart city and intelligent transportation. Remote sensing images and vehicle trajectories are attractive data sources with rich and complementary multimodal road information, and the fusion of them promises to significantly promote the performance of road extraction. However, existing studies on fusion-based road extraction suffer from the problems that the feature extraction modules pay little attention to the inherent morphology of roads, and the multimodal feature fusion techniques are too simple and superficial to fully and efficiently exploit the complementary information from different data sources, resulting in road predictions with poor continuity and limited performance. To this end, we propose a Bilateral Synergistic Fusion network with novel Dynamic Flow convolution, termed DF-BSFNet, which fully leverages the complementary road information from images and trajectories in a dual-mutual adaptive guidance and incremental refinement manner. First, we propose a novel Dynamic Flow Convolution (DFConv) that more adeptly and consciously captures the elongated and winding “flow” morphology of roads in complex scenarios, providing flexible and powerful capabilities for learning detail-heavy and robust road feature representations. Second, we develop two parallel modality-specific feature extractors with DFConv to extract hierarchical road features specific to images and trajectories, effectively exploiting the distinctive advantages of each modality. Third, we propose a Bilateral Synergistic Adaptive Feature Fusion (BSAFF) module which synthesizes the global-context and local-context of complementary multimodal road information and achieves a sophisticated feature fusion with dynamic guided-propagation and dual-mutual refinement. Extensive experiments on three road datasets demonstrate that our DF-BSFNet outperforms current state-of-the-art methods by a large margin in terms of continuity and accuracy.
准确、稳健、连续性和完整性好的道路提取对于智慧城市和智能交通的发展至关重要。遥感图像和车辆轨迹是具有丰富互补的多模式道路信息的有吸引力的数据源,它们的融合有望显著提高道路提取的性能。然而,现有基于融合的道路提取研究存在特征提取模块对道路固有形态关注不足,多模态特征融合技术过于简单和肤浅,无法充分有效地利用不同数据源的互补信息,导致道路预测连续性差,性能有限等问题。为此,我们提出了一种具有新型动态流卷积的双边协同融合网络DF-BSFNet,该网络以双向自适应引导和增量细化的方式充分利用了图像和轨迹的互补道路信息。首先,我们提出了一种新的动态流卷积(DFConv),它更熟练和有意识地捕捉复杂场景中道路的细长和蜿蜒的“流”形态,为学习重细节和鲁棒的道路特征表示提供了灵活而强大的能力。其次,我们利用DFConv开发了两个并行的特定于模态的特征提取器,以提取特定于图像和轨迹的分层道路特征,有效地利用了每种模态的独特优势。第三,我们提出了一种双边协同自适应特征融合(BSAFF)模块,该模块综合了互补的多模式道路信息的全局上下文和局部上下文,实现了动态引导传播和双向细化的复杂特征融合。在三个道路数据集上进行的大量实验表明,我们的DF-BSFNet在连续性和准确性方面大大优于当前最先进的方法。
{"title":"DF-BSFNet: A bilateral synergistic fusion network with novel dynamic flow convolution for robust road extraction","authors":"Chong Zhang, Huazu Zhang, Xiaogang Guo, Heng Qi, Zilong Zhao, Luliang Tang","doi":"10.1016/j.inffus.2025.102958","DOIUrl":"https://doi.org/10.1016/j.inffus.2025.102958","url":null,"abstract":"Accurate and robust road extraction with good continuity and completeness is crucial for the development of smart city and intelligent transportation. Remote sensing images and vehicle trajectories are attractive data sources with rich and complementary multimodal road information, and the fusion of them promises to significantly promote the performance of road extraction. However, existing studies on fusion-based road extraction suffer from the problems that the feature extraction modules pay little attention to the inherent morphology of roads, and the multimodal feature fusion techniques are too simple and superficial to fully and efficiently exploit the complementary information from different data sources, resulting in road predictions with poor continuity and limited performance. To this end, we propose a <ce:bold>B</ce:bold>ilateral <ce:bold>S</ce:bold>ynergistic <ce:bold>F</ce:bold>usion network with novel <ce:bold>D</ce:bold>ynamic <ce:bold>F</ce:bold>low convolution, termed DF-BSFNet, which fully leverages the complementary road information from images and trajectories in a dual-mutual adaptive guidance and incremental refinement manner. First, we propose a novel Dynamic Flow Convolution (DFConv) that more adeptly and consciously captures the elongated and winding “flow” morphology of roads in complex scenarios, providing flexible and powerful capabilities for learning detail-heavy and robust road feature representations. Second, we develop two parallel modality-specific feature extractors with DFConv to extract hierarchical road features specific to images and trajectories, effectively exploiting the distinctive advantages of each modality. Third, we propose a Bilateral Synergistic Adaptive Feature Fusion (BSAFF) module which synthesizes the global-context and local-context of complementary multimodal road information and achieves a sophisticated feature fusion with dynamic guided-propagation and dual-mutual refinement. Extensive experiments on three road datasets demonstrate that our DF-BSFNet outperforms current state-of-the-art methods by a large margin in terms of continuity and accuracy.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"9 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation KDFuse:一种基于跨领域知识蒸馏的高级视觉任务驱动的红外和可见光图像融合方法
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-14 DOI: 10.1016/j.inffus.2025.102944
Chenjia Yang, Xiaoqing Luo, Zhancheng Zhang, Zhiguo Chen, Xiao-jun Wu
To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at https://github.com/lxq-jnu/KDFuse.
为了增强融合特征的全面性,满足高级视觉任务的要求,一些融合方法试图通过与高级语义特征直接交互来协调融合过程。然而,由于高级语义域和融合表示域之间存在显著差异,因此协作方法在直接交互方面的有效性仍有提高的潜力。为了克服这一障碍,提出了一种基于跨领域知识蒸馏的高级视觉任务驱动的红外与可见光图像融合方法,称为KDFuse。KDFuse通过跨领域的知识升华,将多任务感知表示引入同一领域。通过促进语义信息与融合信息在等效层次上的交互,有效地缩小了语义域与融合域之间的差距,实现了多任务协同融合。具体而言,为了获得指导融合网络所需的高级语义表示,通过多域交互蒸馏模块(MIDM)建立教学关系,实现多任务协作。设计了多尺度语义感知模块(MSPM),通过跨领域知识蒸馏学习捕获语义信息的能力;构建了语义细节集成模块(SDIM),将融合级语义表示与融合级视觉表示相结合。此外,为了在融合过程中平衡语义表示和视觉表示,在损失函数中引入傅里叶变换。广泛的综合实验证明了该方法在图像融合和下游任务中的有效性。源代码可从https://github.com/lxq-jnu/KDFuse获得。
{"title":"KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation","authors":"Chenjia Yang, Xiaoqing Luo, Zhancheng Zhang, Zhiguo Chen, Xiao-jun Wu","doi":"10.1016/j.inffus.2025.102944","DOIUrl":"https://doi.org/10.1016/j.inffus.2025.102944","url":null,"abstract":"To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at <ce:inter-ref xlink:href=\"https://github.com/lxq-jnu/KDFuse\" xlink:type=\"simple\">https://github.com/lxq-jnu/KDFuse</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"31 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SelfFed: Self-adaptive Federated Learning with Non-IID data on Heterogeneous Edge Devices for Bias Mitigation and Enhance Training Efficiency 自适应:基于异构边缘设备的非iid数据的自适应联邦学习,以减轻偏差并提高训练效率
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-10 DOI: 10.1016/j.inffus.2025.102932
Neha Singh, Mainak Adhikari
Federated learning (FL) offers a decentralized and collaborative training solution on resource-constraint Edge Devices (EDs) to improve a global model without sharing raw data. Standard Synchronous FL (SFL) approaches provide significant advantages in terms of data privacy and reduced communication overhead, however, face several challenges including Non-independent and identically distributed (Non-IID) data, the presence of unlabeled data, biased aggregation due to device heterogeneity and effective EDs selection to handle the straggler. To tackle these challenges, we propose a new Self-adaptive Federated Learning (SelfFed) strategy using a masked loss function to handle unlabeled data. This allows EDs to concentrate on labeled data, enhancing training efficiency. Additionally, we integrate a novel quality-dependent aggregation solution to mitigate bias during model updates through aggregation. This solution accurately reflects performance across Non-IID data distributions by incentivizing local EDs using a new Stackelberg game model. The model provides rewards based on their contributions to the global model, thereby keeping the EDs motivated to participate and perform well. Finally, we incorporate a deep reinforcement learning technique into the proposed SelfFed strategy for dynamic ED selection to handle straggler EDs. This technique adapts to changes in device performance and resources over iterations, fostering collaboration and sustained engagement. The performance of the SelfFed strategy is evaluated using a real-time SFL scenario (irrigation control in paddy fields) and three benchmark datasets using a serverless private cloud environment. Comparative results against state-of-the-art approaches reveal that the SelfFed significantly reduces CPU usage by 5%–6% and enhances training efficiency by 4%–8% while achieving 4%–6% higher accuracy. Further, in the real-time scenario, the SelfFed improves CPU usage by 3%–5% and enhances training efficiency by 8%–10% with 5%–7% higher accuracy.
联邦学习(FL)在资源受限的边缘设备(ed)上提供了一种分散和协作的训练解决方案,在不共享原始数据的情况下改进全局模型。标准同步FL (SFL)方法在数据隐私和减少通信开销方面提供了显着优势,然而,面临着一些挑战,包括非独立和同分布(Non-IID)数据、未标记数据的存在、由于设备异构而导致的偏差聚合以及有效的ed选择来处理离散者。为了应对这些挑战,我们提出了一种新的自适应联邦学习(Self-adaptive Federated Learning, self - ffed)策略,使用屏蔽损失函数来处理未标记的数据。这使得EDs能够专注于标记数据,提高培训效率。此外,我们集成了一种新的依赖于质量的聚合解决方案,通过聚合来减轻模型更新过程中的偏差。该解决方案通过使用新的Stackelberg游戏模型激励本地ed,准确地反映了非iid数据分布的性能。该模型根据他们对全球模型的贡献提供奖励,从而保持主编参与和表现良好的动力。最后,我们将深度强化学习技术整合到所提出的用于动态ED选择的SelfFed策略中,以处理掉队ED。这种技术适应设备性能和资源在迭代过程中的变化,促进协作和持续参与。使用实时SFL场景(水田灌溉控制)和使用无服务器私有云环境的三个基准数据集来评估SelfFed策略的性能。与最先进方法的对比结果表明,SelfFed显著降低了5%-6%的CPU使用率,提高了4%-8%的训练效率,同时实现了4%-6%的准确率提高。此外,在实时场景中,SelfFed将CPU使用率提高了3%-5%,将训练效率提高了8%-10%,准确率提高了5%-7%。
{"title":"SelfFed: Self-adaptive Federated Learning with Non-IID data on Heterogeneous Edge Devices for Bias Mitigation and Enhance Training Efficiency","authors":"Neha Singh, Mainak Adhikari","doi":"10.1016/j.inffus.2025.102932","DOIUrl":"https://doi.org/10.1016/j.inffus.2025.102932","url":null,"abstract":"Federated learning (FL) offers a decentralized and collaborative training solution on resource-constraint Edge Devices (EDs) to improve a global model without sharing raw data. Standard Synchronous FL (SFL) approaches provide significant advantages in terms of data privacy and reduced communication overhead, however, face several challenges including Non-independent and identically distributed (Non-IID) data, the presence of unlabeled data, biased aggregation due to device heterogeneity and effective EDs selection to handle the straggler. To tackle these challenges, we propose a new Self-adaptive Federated Learning (SelfFed) strategy using a masked loss function to handle unlabeled data. This allows EDs to concentrate on labeled data, enhancing training efficiency. Additionally, we integrate a novel quality-dependent aggregation solution to mitigate bias during model updates through aggregation. This solution accurately reflects performance across Non-IID data distributions by incentivizing local EDs using a new Stackelberg game model. The model provides rewards based on their contributions to the global model, thereby keeping the EDs motivated to participate and perform well. Finally, we incorporate a deep reinforcement learning technique into the proposed SelfFed strategy for dynamic ED selection to handle straggler EDs. This technique adapts to changes in device performance and resources over iterations, fostering collaboration and sustained engagement. The performance of the SelfFed strategy is evaluated using a real-time SFL scenario (irrigation control in paddy fields) and three benchmark datasets using a serverless private cloud environment. Comparative results against state-of-the-art approaches reveal that the SelfFed significantly reduces CPU usage by 5%–6% and enhances training efficiency by 4%–8% while achieving 4%–6% higher accuracy. Further, in the real-time scenario, the SelfFed improves CPU usage by 3%–5% and enhances training efficiency by 8%–10% with 5%–7% higher accuracy.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"28 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142975154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEMO: A Dynamics-Enhanced Learning Model for multi-horizon trajectory prediction in autonomous vehicles 演示:用于自动驾驶车辆多视界轨迹预测的动态增强学习模型
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-09 DOI: 10.1016/j.inffus.2024.102924
Chengyue Wang, Haicheng Liao, Kaiqun Zhu, Guohui Zhang, Zhenning Li
Autonomous vehicles (AVs) rely on accurate trajectory prediction of surrounding vehicles to ensure the safety of both passengers and other road users. Trajectory prediction spans both short-term and long-term horizons, each requiring distinct considerations: short-term predictions rely on accurately capturing the vehicle’s dynamics, while long-term predictions rely on accurately modeling the interaction patterns within the environment. However current approaches, either physics-based or learning-based models, always ignore these distinct considerations, making them struggle to find the optimal prediction for both short-term and long-term horizon. In this paper, we introduce the Dynamics-Enhanced Learning MOdel (DEMO), a novel approach that combines a physics-based Vehicle Dynamics Model with advanced deep learning algorithms. DEMO employs a two-stage architecture, featuring a Dynamics Learning Stage and an Interaction Learning Stage, where the former stage focuses on capturing vehicle motion dynamics and the latter focuses on modeling interaction. By capitalizing on the respective strengths of both methods, DEMO facilitates multi-horizon predictions for future trajectories. Experimental results on the Next Generation Simulation (NGSIM), Macau Connected Autonomous Driving (MoCAD), Highway Drone (HighD), and nuScenes datasets demonstrate that DEMO outperforms state-of-the-art (SOTA) baselines in both short-term and long-term prediction horizons.
自动驾驶汽车依靠对周围车辆的准确轨迹预测来确保乘客和其他道路使用者的安全。轨迹预测跨越短期和长期视野,每一个都需要不同的考虑:短期预测依赖于准确捕捉车辆的动态,而长期预测依赖于准确模拟环境中的相互作用模式。然而,目前的方法,无论是基于物理的还是基于学习的模型,总是忽略了这些不同的考虑因素,这使得它们很难找到短期和长期的最佳预测。在本文中,我们介绍了动态增强学习模型(DEMO),这是一种将基于物理的车辆动力学模型与先进的深度学习算法相结合的新方法。DEMO采用两阶段架构,包括动力学学习阶段和交互学习阶段,其中前一阶段侧重于捕获车辆运动动力学,后一阶段侧重于建模交互。通过利用这两种方法各自的优势,DEMO促进了对未来轨迹的多视界预测。在下一代模拟(NGSIM)、澳门互联自动驾驶(MoCAD)、高速公路无人机(HighD)和nuScenes数据集上的实验结果表明,DEMO在短期和长期预测范围内都优于最先进的(SOTA)基线。
{"title":"DEMO: A Dynamics-Enhanced Learning Model for multi-horizon trajectory prediction in autonomous vehicles","authors":"Chengyue Wang, Haicheng Liao, Kaiqun Zhu, Guohui Zhang, Zhenning Li","doi":"10.1016/j.inffus.2024.102924","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102924","url":null,"abstract":"Autonomous vehicles (AVs) rely on accurate trajectory prediction of surrounding vehicles to ensure the safety of both passengers and other road users. Trajectory prediction spans both short-term and long-term horizons, each requiring distinct considerations: short-term predictions rely on accurately capturing the vehicle’s dynamics, while long-term predictions rely on accurately modeling the interaction patterns within the environment. However current approaches, either physics-based or learning-based models, always ignore these distinct considerations, making them struggle to find the optimal prediction for both short-term and long-term horizon. In this paper, we introduce the <ce:bold>D</ce:bold>ynamics-<ce:bold>E</ce:bold>nhanced Learning <ce:bold>MO</ce:bold>del (<ce:bold>DEMO</ce:bold>), a novel approach that combines a physics-based Vehicle Dynamics Model with advanced deep learning algorithms. DEMO employs a two-stage architecture, featuring a Dynamics Learning Stage and an Interaction Learning Stage, where the former stage focuses on capturing vehicle motion dynamics and the latter focuses on modeling interaction. By capitalizing on the respective strengths of both methods, DEMO facilitates multi-horizon predictions for future trajectories. Experimental results on the Next Generation Simulation (NGSIM), Macau Connected Autonomous Driving (MoCAD), Highway Drone (HighD), and nuScenes datasets demonstrate that DEMO outperforms state-of-the-art (SOTA) baselines in both short-term and long-term prediction horizons.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"29 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TMVF: Trusted Multi-View Fish Behavior Recognition with correlative feature and adaptive evidence fusion TMVF:基于相关特征和自适应证据融合的可信多视图鱼类行为识别
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-08 DOI: 10.1016/j.inffus.2024.102899
Zhenxi Zhao, Xinting Yang, Chunjiang Zhao, Chao Zhou
Utilizing multi-view learning to analyze fish behavior is crucial for fish disease early warning and developing intelligent feeding strategies. Trusted multi-view classification based on Dempster–Shafer Theory (DST) can effectively resolve view conflicts and significantly improve accuracy. However, these DST-based methods often assume that view source domain data are “independent”, and ignore the associations between different views, this can lead to inaccurate fusion and decision errors. To address this limitation, this paper proposes a Trusted Multi-View Fish (TMVF) Behavior Recognition Model that leverages adaptive fusion of associative feature evidence. TMVF employs a Multi-Source Composite Backbone (MSCB) at the feature level to integrate learning across different visual feature dimensions, providing non-independent feature vectors for deeper associative distribution learning. Additionally, a Trusted Association Multi-view (TAMV) Feature Fusion Module is introduced at the vector evidence level. TAMV utilizes a cross-association fusion method to capture the deeper associations between feature vectors rather than treating them as independent sources. It also employs a Dirichlet distribution for more reliable predictions, addressing conflicts between views. To validate TMVF’s performance, a real-world Multi-view Fish Behavior Recognition Dataset (MFBR) with top, underwater, and depth color views was constructed. Experimental results demonstrated TAMV’s superior performance on both the SynDD2 and MFBR datasets. Notably, TMVF achieved an accuracy of 98.48% on SynDD2, surpassing the Frame-flexible network (FFN) by 9.94%. On the MFBR dataset, TMVF achieved an accuracy of 96.56% and an F1-macro score of 94.31%, outperforming I3d+resnet50 by 10.62% and 50.4%, and the FFN by 4.5% and 30.58%, respectively. This demonstrates the effectiveness of TMVF in multi view tasks such as human and animal behavior recognition. The code will be publicly available on GitHub (https://github.com/crazysboy/TMVF).
利用多视角学习分析鱼类行为对鱼类疾病预警和制定智能摄食策略至关重要。基于Dempster-Shafer (DST)理论的可信多视图分类可以有效地解决视图冲突,显著提高准确率。然而,这些基于dst的方法往往假设视图源域数据是“独立的”,而忽略了不同视图之间的关联,这可能导致不准确的融合和决策错误。为了解决这一限制,本文提出了一种利用自适应融合关联特征证据的可信多视图鱼(TMVF)行为识别模型。TMVF在特征层采用多源复合主干(Multi-Source Composite Backbone, MSCB)来整合不同视觉特征维度的学习,为深度关联分布学习提供非独立的特征向量。此外,在矢量证据层面引入了可信关联多视图(TAMV)特征融合模块。TAMV利用交叉关联融合方法来捕获特征向量之间更深层次的关联,而不是将它们视为独立的源。它还采用狄利克雷分布进行更可靠的预测,解决了观点之间的冲突。为了验证TMVF的性能,构建了具有顶部、水下和深度颜色视图的真实多视图鱼类行为识别数据集(MFBR)。实验结果表明TAMV在SynDD2和MFBR数据集上都具有优异的性能。值得注意的是,TMVF在SynDD2上实现了98.48%的准确率,比帧灵活网络(FFN)高出9.94%。在MFBR数据集上,TMVF的准确率为96.56%,F1-macro得分为94.31%,分别比I3d+resnet50高10.62%和50.4%,比FFN高4.5%和30.58%。这证明了TMVF在人类和动物行为识别等多视图任务中的有效性。代码将在GitHub (https://github.com/crazysboy/TMVF)上公开提供。
{"title":"TMVF: Trusted Multi-View Fish Behavior Recognition with correlative feature and adaptive evidence fusion","authors":"Zhenxi Zhao, Xinting Yang, Chunjiang Zhao, Chao Zhou","doi":"10.1016/j.inffus.2024.102899","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102899","url":null,"abstract":"Utilizing multi-view learning to analyze fish behavior is crucial for fish disease early warning and developing intelligent feeding strategies. Trusted multi-view classification based on Dempster–Shafer Theory (DST) can effectively resolve view conflicts and significantly improve accuracy. However, these DST-based methods often assume that view source domain data are “independent”, and ignore the associations between different views, this can lead to inaccurate fusion and decision errors. To address this limitation, this paper proposes a Trusted Multi-View Fish (TMVF) Behavior Recognition Model that leverages adaptive fusion of associative feature evidence. TMVF employs a Multi-Source Composite Backbone (MSCB) at the feature level to integrate learning across different visual feature dimensions, providing non-independent feature vectors for deeper associative distribution learning. Additionally, a Trusted Association Multi-view (TAMV) Feature Fusion Module is introduced at the vector evidence level. TAMV utilizes a cross-association fusion method to capture the deeper associations between feature vectors rather than treating them as independent sources. It also employs a Dirichlet distribution for more reliable predictions, addressing conflicts between views. To validate TMVF’s performance, a real-world Multi-view Fish Behavior Recognition Dataset (MFBR) with top, underwater, and depth color views was constructed. Experimental results demonstrated TAMV’s superior performance on both the SynDD2 and MFBR datasets. Notably, TMVF achieved an accuracy of 98.48% on SynDD2, surpassing the Frame-flexible network (FFN) by 9.94%. On the MFBR dataset, TMVF achieved an accuracy of 96.56% and an F1-macro score of 94.31%, outperforming I3d+resnet50 by 10.62% and 50.4%, and the FFN by 4.5% and 30.58%, respectively. This demonstrates the effectiveness of TMVF in multi view tasks such as human and animal behavior recognition. The code will be publicly available on GitHub (<ce:inter-ref xlink:href=\"https://github.com/crazysboy/TMVF\" xlink:type=\"simple\">https://github.com/crazysboy/TMVF</ce:inter-ref>).","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"205 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a robust multi-view information bottleneck using Cauchy–Schwarz divergence 基于Cauchy-Schwarz散度的鲁棒多视图信息瓶颈
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-08 DOI: 10.1016/j.inffus.2025.102934
Qi Zhang, Mingfei Lu, Jingmin Xin, Badong Chen
Efficiently preserving task-relevant information while removing noise and redundancy in multi-view data remains a core challenge. The information bottleneck principle offers an information-theoretic framework to compress data while retaining essential information for the task. However, estimating mutual information in high-dimensional spaces is computationally intractable. Commonly used variational methods introduce uncertainty and risk performance degradation. To overcome these limitations, we propose a robust deterministic multi-view information bottleneck framework that circumvents the need for variational inference or distributional assumptions. Specifically, we present a non-parametric mutual information estimation based on the Cauchy–Schwarz divergence, eliminating the need for auxiliary neural estimators and significantly simplifying the optimization of the information bottleneck. Leveraging this mutual information measure, we design a neural network framework that robustly compresses high-dimensional multi-view data into a low-dimensional representation, extracting task-relevant features that adhere to both sufficiency and minimality. Additionally, attention mechanisms are employed to fuse compact features across different views, capturing interdependencies and enhancing the integration of complementary information. This fusion process improves the robustness of the overall representation. Statistical analysis using the Nemenyi test shows statistically significant differences in performance between our method and existing algorithms, with a critical distance (CD = 1.856, p-value <0.05), demonstrating the superiority of our approach. Experimental results on synthetic data highlight the framework’s robustness in handling noise and redundancy, demonstrating its effectiveness in challenging environments. Validation on eight real-world datasets, including electroencephalography and Alzheimer’s neuroimaging data, confirms its superior performance, particularly with limited training samples. The implementation is available at https://github.com/archy666/CSMVIB.
有效地保留任务相关信息,同时去除多视图数据中的噪声和冗余仍然是一个核心挑战。信息瓶颈原理提供了一个信息理论框架来压缩数据,同时保留任务的基本信息。然而,估计高维空间中的互信息在计算上是难以处理的。常用的变分方法引入了不确定性和性能下降的风险。为了克服这些限制,我们提出了一个鲁棒的确定性多视图信息瓶颈框架,该框架绕过了对变分推理或分布假设的需要。具体来说,我们提出了一种基于Cauchy-Schwarz散度的非参数互信息估计,消除了对辅助神经估计器的需要,大大简化了信息瓶颈的优化。利用这种互信息度量,我们设计了一个神经网络框架,该框架稳健地将高维多视图数据压缩为低维表示,提取与任务相关的特征,坚持充分性和极小性。此外,注意机制用于融合不同视图之间的紧凑特征,捕获相互依赖关系并增强互补信息的集成。这种融合过程提高了整体表示的鲁棒性。使用Nemenyi检验进行统计分析表明,我们的方法与现有算法的性能差异具有统计学意义,存在临界距离(CD = 1.856, p值<;0.05),表明我们的方法具有优越性。在合成数据上的实验结果显示了该框架在处理噪声和冗余方面的鲁棒性,证明了其在具有挑战性的环境中的有效性。八个真实世界数据集的验证,包括脑电图和阿尔茨海默氏症的神经成像数据,证实了其优越的性能,特别是在有限的训练样本。该实现可从https://github.com/archy666/CSMVIB获得。
{"title":"Towards a robust multi-view information bottleneck using Cauchy–Schwarz divergence","authors":"Qi Zhang, Mingfei Lu, Jingmin Xin, Badong Chen","doi":"10.1016/j.inffus.2025.102934","DOIUrl":"https://doi.org/10.1016/j.inffus.2025.102934","url":null,"abstract":"Efficiently preserving task-relevant information while removing noise and redundancy in multi-view data remains a core challenge. The information bottleneck principle offers an information-theoretic framework to compress data while retaining essential information for the task. However, estimating mutual information in high-dimensional spaces is computationally intractable. Commonly used variational methods introduce uncertainty and risk performance degradation. To overcome these limitations, we propose a robust deterministic multi-view information bottleneck framework that circumvents the need for variational inference or distributional assumptions. Specifically, we present a non-parametric mutual information estimation based on the Cauchy–Schwarz divergence, eliminating the need for auxiliary neural estimators and significantly simplifying the optimization of the information bottleneck. Leveraging this mutual information measure, we design a neural network framework that robustly compresses high-dimensional multi-view data into a low-dimensional representation, extracting task-relevant features that adhere to both sufficiency and minimality. Additionally, attention mechanisms are employed to fuse compact features across different views, capturing interdependencies and enhancing the integration of complementary information. This fusion process improves the robustness of the overall representation. Statistical analysis using the Nemenyi test shows statistically significant differences in performance between our method and existing algorithms, with a critical distance (CD = 1.856, <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:mi>p</mml:mi></mml:math>-value <mml:math altimg=\"si2.svg\" display=\"inline\"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>05</mml:mn></mml:mrow></mml:math>), demonstrating the superiority of our approach. Experimental results on synthetic data highlight the framework’s robustness in handling noise and redundancy, demonstrating its effectiveness in challenging environments. Validation on eight real-world datasets, including electroencephalography and Alzheimer’s neuroimaging data, confirms its superior performance, particularly with limited training samples. The implementation is available at <ce:inter-ref xlink:href=\"https://github.com/archy666/CSMVIB\" xlink:type=\"simple\">https://github.com/archy666/CSMVIB</ce:inter-ref>.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"22 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the local diagnostic explanations of diabetes mellitus with the ensemble of label noise filters 标签噪声滤波器集成改进糖尿病的局部诊断解释
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.inffus.2025.102928
Che Xu, Peng Zhu, Jiacun Wang, Giancarlo Fortino
In the era of big data, accurately diagnosing diabetes mellitus (DM) often requires fusing diverse types of information. Machine learning has emerged as a prevalent approach to achieve this. Despite its potential, clinical acceptance remains limited, primarily due to the lack of explainability in diagnostic predictions. The emergence of explainable artificial intelligence (XAI) offers a promising solution, yet both explainable and non-explainable models rely heavily on noise-free datasets. Label noise filters (LNFs) have been designed to enhance dataset quality by identifying and removing mislabeled samples, which can improve the predictive performance of diagnostic models. However, the impact of label noise on diagnostic explanations remains unexplored. To address this issue, this paper proposes an ensemble framework for LNFs that fuses information from different LNFs through three phases. In the first phase, a diverse pool of LNFs is generated. Second, the widely-used LIME (Local Interpretable Model-Agnostic Explanations) technique is employed to provide local explainability for diagnostic predictions made by black-box models. Finally, four ensemble strategies are designed to generate the final local diagnostic explanations for DM patients. The theoretical advantage of the ensemble is also demonstrated. The proposed framework is comprehensively evaluated on four DM datasets to assess its ability to mitigate the adverse impact of label noise on diagnostic explanations, compared to 24 baseline LNFs. Experimental results demonstrate that individual LNFs fail to consistently ensure the quality of diagnostic explanations, whereas the LNF ensemble based on local explanations provides a feasible solution to this challenge.
在大数据时代,准确诊断糖尿病(DM)往往需要融合多种类型的信息。机器学习已经成为实现这一目标的普遍方法。尽管有潜力,但临床接受度仍然有限,主要是由于诊断预测缺乏可解释性。可解释人工智能(XAI)的出现提供了一个很有前途的解决方案,但可解释和不可解释的模型都严重依赖于无噪声数据集。标签噪声过滤器(lnf)被设计用来通过识别和去除错误标记的样本来提高数据集质量,这可以提高诊断模型的预测性能。然而,标签噪声对诊断解释的影响仍未被探索。为了解决这个问题,本文提出了一个集成框架,该框架通过三个阶段融合来自不同lnf的信息。在第一阶段,生成一个不同的lnf池。其次,采用广泛使用的LIME (Local Interpretable Model-Agnostic Explanations)技术,为黑箱模型做出的诊断预测提供局部可解释性。最后,设计了四种集成策略来生成DM患者的最终局部诊断解释。同时也证明了该系统的理论优势。与24个基线lnf相比,该框架在4个DM数据集上进行了全面评估,以评估其减轻标签噪声对诊断解释的不利影响的能力。实验结果表明,单个LNF不能始终如一地保证诊断解释的质量,而基于局部解释的LNF集合为这一挑战提供了可行的解决方案。
{"title":"Improving the local diagnostic explanations of diabetes mellitus with the ensemble of label noise filters","authors":"Che Xu, Peng Zhu, Jiacun Wang, Giancarlo Fortino","doi":"10.1016/j.inffus.2025.102928","DOIUrl":"https://doi.org/10.1016/j.inffus.2025.102928","url":null,"abstract":"In the era of big data, accurately diagnosing diabetes mellitus (DM) often requires fusing diverse types of information. Machine learning has emerged as a prevalent approach to achieve this. Despite its potential, clinical acceptance remains limited, primarily due to the lack of explainability in diagnostic predictions. The emergence of explainable artificial intelligence (XAI) offers a promising solution, yet both explainable and non-explainable models rely heavily on noise-free datasets. Label noise filters (LNFs) have been designed to enhance dataset quality by identifying and removing mislabeled samples, which can improve the predictive performance of diagnostic models. However, the impact of label noise on diagnostic explanations remains unexplored. To address this issue, this paper proposes an ensemble framework for LNFs that fuses information from different LNFs through three phases. In the first phase, a diverse pool of LNFs is generated. Second, the widely-used LIME (Local Interpretable Model-Agnostic Explanations) technique is employed to provide local explainability for diagnostic predictions made by black-box models. Finally, four ensemble strategies are designed to generate the final local diagnostic explanations for DM patients. The theoretical advantage of the ensemble is also demonstrated. The proposed framework is comprehensively evaluated on four DM datasets to assess its ability to mitigate the adverse impact of label noise on diagnostic explanations, compared to 24 baseline LNFs. Experimental results demonstrate that individual LNFs fail to consistently ensure the quality of diagnostic explanations, whereas the LNF ensemble based on local explanations provides a feasible solution to this challenge.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"14 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCIP: Network with topology capture and incongruity perception for sarcasm detection 基于拓扑捕获和不一致感知的讽刺检测网络
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.inffus.2024.102918
Ling Gao, Nan Sheng, Yiming Liu, Hao Xu
Multimodal sarcasm detection is a pivotal visual-linguistic task that aims to identify incongruity between the text purpose and the underlying meaning of other modal data. Existing works are dedicated to the learning of unimodal embeddings and the fusion of multimodal information. Nonetheless, they neglect the importance of topology and incongruity between multimodal information for sarcasm detection. Therefore, we propose a novel multimodal sarcasm detection network that incorporates multimodal topology capture and incongruity perception (TCIP). A text single-mode graph, a visual single-mode graph, and a visual–text heterogeneous graph are first established, where nodes contain visual elements and text elements. The association matrix of the heterogeneous graph encapsulates visual–visual associations, text–text associations, and visual–text associations. Subsequently, TCIP learns single-modal graphs and a heterogeneous graph based on graph convolutional networks to capture text topology information, visual topology information, and multimodal topology information. Furthermore, we pull together multimodal embeddings exhibiting consistent distributions and push away those with inconsistent distributions. TCIP finally feeds the fused embedding into a classifier to detect sarcasm results within visual–text pairs. Experimental results conducted on the multimodal sarcasm detection benchmarks and the multimodal science question answering dataset demonstrate the exceptional performance of TCIP.
多模态讽刺检测是一项关键的视觉语言任务,旨在识别文本目的与其他模态数据的潜在意义之间的不一致。现有的研究主要集中在单模态嵌入的学习和多模态信息的融合。然而,他们忽视了拓扑和多模态信息之间的不一致性对讽刺检测的重要性。因此,我们提出了一种结合多模态拓扑捕获和不一致感知(TCIP)的新型多模态讽刺检测网络。首先建立了文本单模图、视觉单模图和视觉文本异构图,其中节点包含视觉元素和文本元素。异构图的关联矩阵封装了视觉-视觉关联、文本-文本关联和视觉-文本关联。随后,TCIP学习单模态图和基于图卷积网络的异构图,获取文本拓扑信息、视觉拓扑信息和多模态拓扑信息。此外,我们将具有一致分布的多模态嵌入拉到一起,并排除那些具有不一致分布的嵌入。最后,TCIP将融合嵌入到分类器中,以检测视觉文本对中的讽刺结果。在多模态讽刺语检测基准和多模态科学问答数据集上进行的实验结果表明,TCIP具有优异的性能。
{"title":"TCIP: Network with topology capture and incongruity perception for sarcasm detection","authors":"Ling Gao, Nan Sheng, Yiming Liu, Hao Xu","doi":"10.1016/j.inffus.2024.102918","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102918","url":null,"abstract":"Multimodal sarcasm detection is a pivotal visual-linguistic task that aims to identify incongruity between the text purpose and the underlying meaning of other modal data. Existing works are dedicated to the learning of unimodal embeddings and the fusion of multimodal information. Nonetheless, they neglect the importance of topology and incongruity between multimodal information for sarcasm detection. Therefore, we propose a novel multimodal sarcasm detection network that incorporates multimodal topology capture and incongruity perception (TCIP). A text single-mode graph, a visual single-mode graph, and a visual–text heterogeneous graph are first established, where nodes contain visual elements and text elements. The association matrix of the heterogeneous graph encapsulates visual–visual associations, text–text associations, and visual–text associations. Subsequently, TCIP learns single-modal graphs and a heterogeneous graph based on graph convolutional networks to capture text topology information, visual topology information, and multimodal topology information. Furthermore, we pull together multimodal embeddings exhibiting consistent distributions and push away those with inconsistent distributions. TCIP finally feeds the fused embedding into a classifier to detect sarcasm results within visual–text pairs. Experimental results conducted on the multimodal sarcasm detection benchmarks and the multimodal science question answering dataset demonstrate the exceptional performance of TCIP.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"2 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A micro-action-based decision-making framework for simulating overtaking behaviors of heterogeneous pedestrians 基于微行为的异质行人超车模拟决策框架
IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 DOI: 10.1016/j.inffus.2024.102898
Jingxuan Peng, Zhonghua Wei, Yanyan Chen, Shaofan Wang, Yongxing Li, Liang Chen, Fujiyama Taku
In many public places, the heterogeneity of pedestrians leads to diverse travel behaviors including overtaking behavior. However, according to the variety of factors such as the heterogeneous attributes of pedestrians and the alterations of surrounding environment, the previous models for simulating overtaking behavior exist the problems of behavior loss or decision imbalance. By observing that overtaking behavior can be regarded as a process consisting of multiple micro-actions, this paper proposes a micro-action-based macro-to-micro decision-making (M3DM) framework to simulate fine-grained overtaking behavior of heterogeneous pedestrians. The framework incorporates two modules: micro-action modeling (MM) and macro-to-micro decision-making (MMDM) modules. The former module constructs the mapping relationship between proposed micro-actions and multiple personality characterization, and builds the simulation model of each micro-action. While the latter module integrates the density based macro and energy consumption based micro decision into framework, which achieves a more realistic simulation of overtaking behavior. Extensive real experiments are conducted to calibrate the parameters and verify the rationality of our framework. Moreover, two different simulation cases prove the authenticity of the proposed simulation model. The results indicate that the M3DM framework can significantly enhance the simulation accuracy of pedestrian behaviors, providing valuable insights for pedestrian flow management and safety in high-density environments.
在许多公共场所,行人的异质性导致了包括超车行为在内的多种出行行为。然而,由于行人的异质性属性和周围环境的变化等多种因素,以往的超车行为模拟模型存在行为损失或决策不平衡的问题。鉴于超车行为是由多个微观行为组成的过程,本文提出了一种基于微观行为的宏微观决策(M3DM)框架来模拟异构行人的细粒度超车行为。该框架包含两个模块:微观行为建模(MM)和宏观到微观决策(MMDM)模块。前一个模块构建提出的微动作与多重人格刻画之间的映射关系,并建立每个微动作的仿真模型。而后者则将基于密度的宏观决策和基于能耗的微观决策整合到框架中,实现了更为真实的超车行为模拟。进行了大量的实际实验来校准参数并验证我们的框架的合理性。通过两个不同的仿真实例验证了所提仿真模型的真实性。结果表明,M3DM框架可以显著提高行人行为的模拟精度,为高密度环境下的行人流管理和安全提供有价值的见解。
{"title":"A micro-action-based decision-making framework for simulating overtaking behaviors of heterogeneous pedestrians","authors":"Jingxuan Peng, Zhonghua Wei, Yanyan Chen, Shaofan Wang, Yongxing Li, Liang Chen, Fujiyama Taku","doi":"10.1016/j.inffus.2024.102898","DOIUrl":"https://doi.org/10.1016/j.inffus.2024.102898","url":null,"abstract":"In many public places, the heterogeneity of pedestrians leads to diverse travel behaviors including overtaking behavior. However, according to the variety of factors such as the heterogeneous attributes of pedestrians and the alterations of surrounding environment, the previous models for simulating overtaking behavior exist the problems of behavior loss or decision imbalance. By observing that overtaking behavior can be regarded as a process consisting of multiple micro-actions, this paper proposes a micro-action-based macro-to-micro decision-making (M3DM) framework to simulate fine-grained overtaking behavior of heterogeneous pedestrians. The framework incorporates two modules: micro-action modeling (MM) and macro-to-micro decision-making (MMDM) modules. The former module constructs the mapping relationship between proposed micro-actions and multiple personality characterization, and builds the simulation model of each micro-action. While the latter module integrates the density based macro and energy consumption based micro decision into framework, which achieves a more realistic simulation of overtaking behavior. Extensive real experiments are conducted to calibrate the parameters and verify the rationality of our framework. Moreover, two different simulation cases prove the authenticity of the proposed simulation model. The results indicate that the M3DM framework can significantly enhance the simulation accuracy of pedestrian behaviors, providing valuable insights for pedestrian flow management and safety in high-density environments.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"74 1","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1