Jiayong Bao, Jiangshe Zhang, Chunxia Zhang, Lili Bao
{"title":"DCTCNet: Sequency discrete cosine transform convolution network for visual recognition.","authors":"Jiayong Bao, Jiangshe Zhang, Chunxia Zhang, Lili Bao","doi":"10.1016/j.neunet.2025.107143","DOIUrl":null,"url":null,"abstract":"<p><p>The discrete cosine transform (DCT) has been widely used in computer vision tasks due to its ability of high compression ratio and high-quality visual presentation. However, conventional DCT is usually affected by the size of transform region and results in blocking effect. Therefore, eliminating the blocking effects to efficiently serve for vision tasks is significant and challenging. In this paper, we introduce All Phase Sequency DCT (APSeDCT) into convolutional networks to extract multi-frequency information of deep features. Due to the fact that APSeDCT can be equivalent to convolutional operation, we construct corresponding convolution module called APSeDCT Convolution (APSeDCTConv) that has great transferability similar to vanilla convolution. Then we propose an augmented convolutional operator called MultiConv with APSeDCTConv. By replacing the last three bottleneck blocks of ResNet with MultiConv, our approach not only reduces the computational costs and the number of parameters, but also exhibits great performance in classification, object detection and instance segmentation tasks. Extensive experiments show that APSeDCTConv augmentation leads to consistent performance improvements in image classification on ImageNet across various different models and scales, including ResNet, Res2Net and ResNext, and achieving 0.5%-1.1% and 0.4%-0.7% AP performance improvements for object detection and instance segmentation, respectively, on the COCO benchmark compared to the baseline.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107143"},"PeriodicalIF":6.0000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.neunet.2025.107143","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The discrete cosine transform (DCT) has been widely used in computer vision tasks due to its ability of high compression ratio and high-quality visual presentation. However, conventional DCT is usually affected by the size of transform region and results in blocking effect. Therefore, eliminating the blocking effects to efficiently serve for vision tasks is significant and challenging. In this paper, we introduce All Phase Sequency DCT (APSeDCT) into convolutional networks to extract multi-frequency information of deep features. Due to the fact that APSeDCT can be equivalent to convolutional operation, we construct corresponding convolution module called APSeDCT Convolution (APSeDCTConv) that has great transferability similar to vanilla convolution. Then we propose an augmented convolutional operator called MultiConv with APSeDCTConv. By replacing the last three bottleneck blocks of ResNet with MultiConv, our approach not only reduces the computational costs and the number of parameters, but also exhibits great performance in classification, object detection and instance segmentation tasks. Extensive experiments show that APSeDCTConv augmentation leads to consistent performance improvements in image classification on ImageNet across various different models and scales, including ResNet, Res2Net and ResNext, and achieving 0.5%-1.1% and 0.4%-0.7% AP performance improvements for object detection and instance segmentation, respectively, on the COCO benchmark compared to the baseline.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.