{"title":"EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention","authors":"Yulong Shi;Mingwei Sun;Yongshuai Wang;Jiahao Ma;Zengqiang Chen","doi":"10.1109/TCYB.2025.3532282","DOIUrl":null,"url":null,"abstract":"Owing to advancements in deep learning technology, vision transformers (ViTs) have demonstrated impressive performance in various computer vision tasks. Nonetheless, ViTs still face some challenges, such as high computational complexity and the absence of desirable inductive biases. To alleviate these issues, the potential advantages of combining eagle vision with ViTs are explored. A bi-fovea visual interaction (BFVI) structure inspired by the unique physiological and visual characteristics of eagle eyes is introduced. Based on this structural design approach, a novel bi-fovea self-attention (BFSA) mechanism and bi-fovea feedforward network (BFFN) are proposed. These components are employed to mimic the hierarchical and parallel information processing scheme of the biological visual cortex, thereby enabling networks to learn the feature representations of the targets in a coarse-to-fine manner. Furthermore, a bionic eagle vision (BEV) block is designed as the basic building unit based on the BFSA mechanism and the BFFN. By stacking the BEV blocks, a unified and efficient family of pyramid backbone networks called eagle ViTs (EViTs) is developed. Experimental results indicate that the EViTs exhibit highly competitive performance in various computer vision tasks, demonstrating their potential as backbone networks. In terms of computational efficiency and scalability, EViTs show significant advantages compared with other counterparts. The developed code is available at <uri>https://github.com/nkusyl/EViT</uri>.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 3","pages":"1288-1300"},"PeriodicalIF":10.5000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10876565/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Owing to advancements in deep learning technology, vision transformers (ViTs) have demonstrated impressive performance in various computer vision tasks. Nonetheless, ViTs still face some challenges, such as high computational complexity and the absence of desirable inductive biases. To alleviate these issues, the potential advantages of combining eagle vision with ViTs are explored. A bi-fovea visual interaction (BFVI) structure inspired by the unique physiological and visual characteristics of eagle eyes is introduced. Based on this structural design approach, a novel bi-fovea self-attention (BFSA) mechanism and bi-fovea feedforward network (BFFN) are proposed. These components are employed to mimic the hierarchical and parallel information processing scheme of the biological visual cortex, thereby enabling networks to learn the feature representations of the targets in a coarse-to-fine manner. Furthermore, a bionic eagle vision (BEV) block is designed as the basic building unit based on the BFSA mechanism and the BFFN. By stacking the BEV blocks, a unified and efficient family of pyramid backbone networks called eagle ViTs (EViTs) is developed. Experimental results indicate that the EViTs exhibit highly competitive performance in various computer vision tasks, demonstrating their potential as backbone networks. In terms of computational efficiency and scalability, EViTs show significant advantages compared with other counterparts. The developed code is available at https://github.com/nkusyl/EViT.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.