Detecting plant diseases is a crucial aspect of modern agriculture, playing a key role in maintaining crop health and ensuring sustainable yields. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning (ML) techniques, both of which face limitations in scalability and accuracy. The emergence of Vision Transformers (ViTs) marks a significant shift in this landscape by enabling superior modeling of long-range dependencies and offering improved scalability for complex visual tasks. This survey provides a rigorous and structured analysis of impactful studies that employ ViT-based models, along with a comprehensive categorization of existing research. It also offers a quantitative synthesis of reported performance — with accuracies ranging from 75.00% to 100.00% — highlighting clear trends in model effectiveness and identifying consistently high-performing architectures. In addition, this study examines the inductive biases of CNNs and ViTs, which is the first analysis of these architectural priors within an agricultural context. Further contributions include a comparative taxonomy of prior studies, an evaluation of dataset limitations and metric inconsistencies, and a statistical assessment of model efficiency across diverse crop-image sources. Collectively, these efforts clarify the current state of the field, identify critical research gaps, and outline key challenges — such as data diversity, interpretability, computational cost, and field adaptability — that must be addressed to advance the practical deployment of ViT technologies in precision agriculture.
扫码关注我们
求助内容:
应助结果提醒方式:
