{"title":"HCT: Hybrid CNN-Transformer Networks for Super-Resolution","authors":"Jiabin Zhang, Xiaoru Wang, Han Xia, Xiaolong Li","doi":"10.1109/ICCECE58074.2023.10135281","DOIUrl":null,"url":null,"abstract":"Recently, several computer vision tasks have begun to adopt transformer-based approaches with promising results. Using a completely transformer-based architecture in image recovery achieves better performance than the existing CNN approach, but the existing vision transformers lack the scalability for high-resolution images, which means that transformers are underutilized in image restoration tasks. We propose a hybrid architecture (HCT) that uses both CNN and transformer to improve image restoration. HCT consists of transformer and CNN branches. By fully integrating the two branches, we strengthen the network's ability of parameter sharing and local information aggregation, and also increase the network's ability to integrate global information, and finally achieve the purpose of improving the image recovery effect. Our proposed transformer branch uses a spatial fusion adaptive attention model that blends local and global attention improving image restoration while reducing computing costs. Extensive experiments show that HCT achieves competitive results in super-resolution tasks.","PeriodicalId":120030,"journal":{"name":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE58074.2023.10135281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, several computer vision tasks have begun to adopt transformer-based approaches with promising results. Using a completely transformer-based architecture in image recovery achieves better performance than the existing CNN approach, but the existing vision transformers lack the scalability for high-resolution images, which means that transformers are underutilized in image restoration tasks. We propose a hybrid architecture (HCT) that uses both CNN and transformer to improve image restoration. HCT consists of transformer and CNN branches. By fully integrating the two branches, we strengthen the network's ability of parameter sharing and local information aggregation, and also increase the network's ability to integrate global information, and finally achieve the purpose of improving the image recovery effect. Our proposed transformer branch uses a spatial fusion adaptive attention model that blends local and global attention improving image restoration while reducing computing costs. Extensive experiments show that HCT achieves competitive results in super-resolution tasks.