{"title":"结合光学流和斯温变换器实现时空视频超分辨率","authors":"","doi":"10.1016/j.engappai.2024.109227","DOIUrl":null,"url":null,"abstract":"<div><p>Space–time video super-resolution is a task that aims to interpolate low frame rate, low resolution videos to high frame rate, high resolution ones. While existing Transformer-based methods have achieved results comparable to convolutional neural networks-based methods, the computational cost of Transformer limits its performance with constrained computational resources. Moreover, Swin Transformer may fail to fully exploit the spatio-temporal information of video frames due to the limitation of window size, impeding its effectiveness in handling large motions. To address these limitations, we propose an end-to-end space–time video super-resolution architecture based on optical flow alignment and Swin Transformer. The alignment module is introduced to extract spatio-temporal information from adjacent frames without significantly increasing the computational burden. Additionally, we design a residual convolution layer to enhance the translational invariance of the features extracted by Swin Transformer and introduces additional nonlinear transformations. Experimental results demonstrate that our proposed method achieves superior performance on various benchmark datasets compared to state-of-the-art methods. In terms of Peak Signal-to-Noise Ratio, our method outperforms the state-of-the-art methods by at least 0.15 dB on Vimeo-Medium dataset.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining optical flow and Swin Transformer for Space-Time video super-resolution\",\"authors\":\"\",\"doi\":\"10.1016/j.engappai.2024.109227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Space–time video super-resolution is a task that aims to interpolate low frame rate, low resolution videos to high frame rate, high resolution ones. While existing Transformer-based methods have achieved results comparable to convolutional neural networks-based methods, the computational cost of Transformer limits its performance with constrained computational resources. Moreover, Swin Transformer may fail to fully exploit the spatio-temporal information of video frames due to the limitation of window size, impeding its effectiveness in handling large motions. To address these limitations, we propose an end-to-end space–time video super-resolution architecture based on optical flow alignment and Swin Transformer. The alignment module is introduced to extract spatio-temporal information from adjacent frames without significantly increasing the computational burden. Additionally, we design a residual convolution layer to enhance the translational invariance of the features extracted by Swin Transformer and introduces additional nonlinear transformations. Experimental results demonstrate that our proposed method achieves superior performance on various benchmark datasets compared to state-of-the-art methods. In terms of Peak Signal-to-Noise Ratio, our method outperforms the state-of-the-art methods by at least 0.15 dB on Vimeo-Medium dataset.</p></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095219762401385X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095219762401385X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Combining optical flow and Swin Transformer for Space-Time video super-resolution
Space–time video super-resolution is a task that aims to interpolate low frame rate, low resolution videos to high frame rate, high resolution ones. While existing Transformer-based methods have achieved results comparable to convolutional neural networks-based methods, the computational cost of Transformer limits its performance with constrained computational resources. Moreover, Swin Transformer may fail to fully exploit the spatio-temporal information of video frames due to the limitation of window size, impeding its effectiveness in handling large motions. To address these limitations, we propose an end-to-end space–time video super-resolution architecture based on optical flow alignment and Swin Transformer. The alignment module is introduced to extract spatio-temporal information from adjacent frames without significantly increasing the computational burden. Additionally, we design a residual convolution layer to enhance the translational invariance of the features extracted by Swin Transformer and introduces additional nonlinear transformations. Experimental results demonstrate that our proposed method achieves superior performance on various benchmark datasets compared to state-of-the-art methods. In terms of Peak Signal-to-Noise Ratio, our method outperforms the state-of-the-art methods by at least 0.15 dB on Vimeo-Medium dataset.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.