Skin cancer continues to pose a major global health challenge, and its early identification is essential for improving patient outcomes. Traditional diagnostic practices rely heavily on clinician expertise and manual interpretation of dermoscopic images, making the process subjective, inconsistent, and time-consuming. To address these limitations, this work introduces STViTDA-Net, an explainable transformer-based framework designed for fast, objective, and scalable multi-class skin cancer classification. The model integrates three key components: STGAN for class-balanced dermoscopic image augmentation, ViT-MAE for robust hierarchical feature learning through masked patch reconstruction, and a Deformable Attention Transformer Encoder that adaptively focuses on irregular lesion boundaries and subtle spatial variations. Preprocessing with Error Level Analysis (ELA) enhances fine-grained diagnostic cues, while Grad-CAM provides interpretable heatmaps that highlight the regions influencing the model's predictions. Unlike manual dermoscopic evaluation, STViTDA-Net performs end-to-end inference within milliseconds and delivers consistent, expert-independent predictions supported by visual explanations. When evaluated on the ISIC2019 dataset comprising nine lesion categories, the model achieves 99.35 % accuracy, 99.0 % precision, 99.5 % recall, 99.2 % F1-score, and 99.2 % AUC-ROC, surpassing existing CNN and transformer baselines. By unifying class-balanced augmentation, adaptive feature encoding, deformable attention, and explainable outputs, STViTDA-Net establishes a powerful and efficient solution for automated dermatological diagnosis.
扫码关注我们
求助内容:
应助结果提醒方式:
