TFS-Net: Temporal first simulation network for video saliency prediction

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-07-05 Epub Date: 2025-04-21 DOI:10.1016/j.eswa.2025.127652

Longyi Li , Liyan Dong , Hao Zhang , Jun Qin , Zhengtai Zhang , Minghui Sun

{"title":"TFS-Net: Temporal first simulation network for video saliency prediction","authors":"Longyi Li , Liyan Dong , Hao Zhang , Jun Qin , Zhengtai Zhang , Minghui Sun","doi":"10.1016/j.eswa.2025.127652","DOIUrl":null,"url":null,"abstract":"<div><div>Video saliency prediction (VSP) plays a critical role in modern video processing systems by optimizing computational resource allocation and enhancing overall system performance. However, existing VSP methods either lack effective temporal modeling or incur high computational costs, particularly struggling with the initialization of video sequences. This paper presents TFS-Net, a novel temporal-first simulation network for VSP that integrates both static and dynamic modeling via parallel-optimized self-attention mechanisms. Specifically, TFS-Net addresses the challenge of initial frame processing with the innovative F31 algorithm and improves multi-scale spatiotemporal feature integration through a Hierarchical Decoder with Multi-dimensional Attention (HDMA). Drawing inspiration from primate saccadic behavior, the F31 algorithm optimizes processing efficiency during both training and inference phases, demonstrating particular effectiveness in unmanned aerial vehicle (UAV) real-time applications. Extensive evaluations on public datasets demonstrate that TFS-Net achieves significant improvements over state-of-the-art methods, with gains of 14.6%, 12.0%, and 11.2% in AUC-J, CC, and SIM metrics, respectively. Further experiments on UAV video analysis validate the model’s robustness and practicality in real-world scenarios.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"282 ","pages":"Article 127652"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425012746","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video saliency prediction (VSP) plays a critical role in modern video processing systems by optimizing computational resource allocation and enhancing overall system performance. However, existing VSP methods either lack effective temporal modeling or incur high computational costs, particularly struggling with the initialization of video sequences. This paper presents TFS-Net, a novel temporal-first simulation network for VSP that integrates both static and dynamic modeling via parallel-optimized self-attention mechanisms. Specifically, TFS-Net addresses the challenge of initial frame processing with the innovative F31 algorithm and improves multi-scale spatiotemporal feature integration through a Hierarchical Decoder with Multi-dimensional Attention (HDMA). Drawing inspiration from primate saccadic behavior, the F31 algorithm optimizes processing efficiency during both training and inference phases, demonstrating particular effectiveness in unmanned aerial vehicle (UAV) real-time applications. Extensive evaluations on public datasets demonstrate that TFS-Net achieves significant improvements over state-of-the-art methods, with gains of 14.6%, 12.0%, and 11.2% in AUC-J, CC, and SIM metrics, respectively. Further experiments on UAV video analysis validate the model’s robustness and practicality in real-world scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TFS-Net：视频显著性预测的时间优先仿真网络

视频显著性预测（VSP）通过优化计算资源分配和提高系统整体性能，在现代视频处理系统中起着至关重要的作用。然而，现有的VSP方法要么缺乏有效的时间建模，要么计算成本高，尤其是在视频序列初始化方面。本文提出了一种新的时间优先的VSP仿真网络TFS-Net，该网络通过并行优化的自关注机制集成了静态和动态建模。具体而言，TFS-Net通过创新的F31算法解决了初始帧处理的挑战，并通过具有多维注意（HDMA）的分层解码器改进了多尺度时空特征集成。F31算法从灵长类动物的跳跳行为中获得灵感，优化了训练和推理阶段的处理效率，在无人机（UAV）实时应用中表现出特别的有效性。对公共数据集的广泛评估表明，TFS-Net比最先进的方法取得了显著的改进，在AUC-J、CC和SIM指标上分别获得了14.6%、12.0%和11.2%的收益。进一步的无人机视频分析实验验证了该模型在实际场景中的鲁棒性和实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.