Using on-site images to classify and identify wild mushroom species is the most effective way to prevent incidents of harm caused by eating wild mushrooms. However, the complexity of natural scenes and the similarity of mushroom morphology bring challenges for accurate classification and recognition. To this end, this paper proposes an improved ConvNeXt V2 network model for classification and recognition of mushrooms in complex scenes and similar appearances. First, this study applies data enhancement techniques such as image flipping, adding noise and mosaic to solve the problem of dataset equalization, and constructs a mushroom image dataset containing 18 categories and the number of 10,986 images. Second, a cross-modular approach is used to extract and fuse image features of different dimensions to enhance the feature capture capability of the ConvNeXt V2 model. In addition, the model is optimized by the one-hot coding and the spatial pyramid pooling techniques. The experimental results show that the improved ConvNeXt V2 model outperforms the comparative models such as ResNet, MobileVit, Swin Transformer, ConvNeXt, and ConvNeXt V2 in terms of accuracy, precision, recall, and F1-Score, which are 96.7%, 96.84%, 96.83%, and 96.84%. The ablation experiments further verify the effectiveness and superiority of the proposed improvement strategy in enhancing the model performance, which can effectively improve the efficiency and accuracy of mushroom image classification and recognition.
Practical Application: The study in this paper can be used for the identification of edible and nonedible mushroom, and it can provide technical support to reduce the incidence of mushroom poisoning and ensure food safety.