Purpose of review: Skin type diversity in image datasets refers to the representation of various skin types. This diversity allows for the verification of comparable performance of a trained model across different skin types. A widespread problem in datasets involving human skin is the lack of verifiable diversity in skin types, making it difficult to evaluate whether the performance of the trained models generalizes across different skin types. For example, the diversity issues in skin lesion datasets, which are used to train deep learning-based models, often result in lower accuracy for darker skin types that are typically under-represented in these datasets. Under-representation in datasets results in lower performance in deep learning models for under-represented skin types.
Recent findings: This issue has been discussed in previous works; however, the reporting of skin types, and inherent diversity, have not been fully assessed. Some works report skin types but do not attempt to assess the representation of each skin type in datasets. Others, focusing on skin lesions, identify the issue but do not measure skin type diversity in the datasets examined.
Summary: Effort is needed to address these shortcomings and move towards facilitating verifiable diversity. Building on previous works in skin lesion datasets, this review explores the general issue of skin type diversity by investigating and evaluating skin lesion datasets specifically. The main contributions of this work are an evaluation of publicly available skin lesion datasets and their metadata to assess the frequency and completeness of reporting of skin type and an investigation into the diversity and representation of each skin type within these datasets.
Supplementary information: The online version contains material available at 10.1007/s13671-024-00440-0.