{"title":"A Survey on: Application of Transformer in Computer Vision","authors":"Zhenghua Zhang, Zhangjie Gong, Qingqing Hong","doi":"10.12792/icisip2021.006","DOIUrl":null,"url":null,"abstract":"In the past few years, convolutional neural networks have been considered the mainstream network for processing images. Transformer first proposed a brand new deep neural network in 2017, based mainly on the self-attention mechanism, and has achieved amazing results in the field of natural language processing. Compared with traditional convolutional networks and recurrent networks, the model is superior in quality, has stronger parallelism, and requires less training time. Because of these powerful advantages, more and more related workers are expanding how Transformer is applied to computer vision. This article aims to provide a comprehensive overview of the application of Transformer in computer vision. We first introduce the self-attention mechanism, because it is an important component of Transformer, namely single-headed attention mechanism, multi-headed attention mechanism, position coding, etc. And introduces the reformer model after the transformer is improved. We then introduced some applications of Transformer in computer vision, image classification, object detection, and image processing. At the end of this article, we studied the future research direction and development of Transformer in computer vision, hoping that this article can arouse further interest in Transformer.","PeriodicalId":431446,"journal":{"name":"The Proceedings of The 8th International Conference on Intelligent Systems and Image Processing 2021","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Proceedings of The 8th International Conference on Intelligent Systems and Image Processing 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12792/icisip2021.006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In the past few years, convolutional neural networks have been considered the mainstream network for processing images. Transformer first proposed a brand new deep neural network in 2017, based mainly on the self-attention mechanism, and has achieved amazing results in the field of natural language processing. Compared with traditional convolutional networks and recurrent networks, the model is superior in quality, has stronger parallelism, and requires less training time. Because of these powerful advantages, more and more related workers are expanding how Transformer is applied to computer vision. This article aims to provide a comprehensive overview of the application of Transformer in computer vision. We first introduce the self-attention mechanism, because it is an important component of Transformer, namely single-headed attention mechanism, multi-headed attention mechanism, position coding, etc. And introduces the reformer model after the transformer is improved. We then introduced some applications of Transformer in computer vision, image classification, object detection, and image processing. At the end of this article, we studied the future research direction and development of Transformer in computer vision, hoping that this article can arouse further interest in Transformer.