{"title":"Fusion of ViT Technique and Image Filtering in Deep Learning for Plant Pests and Diseases Recognition","authors":"Van-Dung Hoang, Thanh-an Michel Pham","doi":"10.1109/ICSSE58758.2023.10227192","DOIUrl":null,"url":null,"abstract":"Over a decade, deep learning methods using convolutional neural network (CNN) architecture have achieved breakthroughs in the precision criterion, which compared to the traditional machine learning methods. However, those approaches still faced some limitations of processing time and precision when they are applied to large samples and hard datasets. Recently, some new methods based on the transformer learning approach have been applied to image processing. This direction approach has illustrated the promising results in the terms of accuracy and computational time. This paper presents a new approach, which combines a pre-processing technique of image filtering and vision transformer (ViT) learning for the problem of plant insect pests and diseases recognition. The proposed solution involves some stages: neural network-based image filtering, then passes results through a ViT module to extract feature map, and then fed to multiple head network for classification. The proposed method applies image filtering pre-processing to highlight features before passing results to the ViT processing stage instead of using ViT from raw input images. Furthermore, element-wise multiplication in the frequency domain reduces processing time instead of using convolutional processing in the spatial domain. Experimental results demonstrate that applying filtering preprocessing does not significantly increase the number of learning parameters and training time compared to using ViT directly and it leverages to improve accuracy to compare to well-known models based on deep CNN. The research results also illustrated that the ViT solution and the proposed method are reached more accurate than CNN-based deep learning methods.","PeriodicalId":280745,"journal":{"name":"2023 International Conference on System Science and Engineering (ICSSE)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE58758.2023.10227192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Over a decade, deep learning methods using convolutional neural network (CNN) architecture have achieved breakthroughs in the precision criterion, which compared to the traditional machine learning methods. However, those approaches still faced some limitations of processing time and precision when they are applied to large samples and hard datasets. Recently, some new methods based on the transformer learning approach have been applied to image processing. This direction approach has illustrated the promising results in the terms of accuracy and computational time. This paper presents a new approach, which combines a pre-processing technique of image filtering and vision transformer (ViT) learning for the problem of plant insect pests and diseases recognition. The proposed solution involves some stages: neural network-based image filtering, then passes results through a ViT module to extract feature map, and then fed to multiple head network for classification. The proposed method applies image filtering pre-processing to highlight features before passing results to the ViT processing stage instead of using ViT from raw input images. Furthermore, element-wise multiplication in the frequency domain reduces processing time instead of using convolutional processing in the spatial domain. Experimental results demonstrate that applying filtering preprocessing does not significantly increase the number of learning parameters and training time compared to using ViT directly and it leverages to improve accuracy to compare to well-known models based on deep CNN. The research results also illustrated that the ViT solution and the proposed method are reached more accurate than CNN-based deep learning methods.