Andres Morales-Forero, Lili Rueda Jaime, Sebastian Ramiro Gil-Quiñones, Marlon Y. Barrera Montañez, Samuel Bassetto, Eric Coatanea
{"title":"洞察皮肤镜资料库中的种族偏见:HAM10000 数据集分析","authors":"Andres Morales-Forero, Lili Rueda Jaime, Sebastian Ramiro Gil-Quiñones, Marlon Y. Barrera Montañez, Samuel Bassetto, Eric Coatanea","doi":"10.1002/jvc2.477","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Studies have revealed a lack of representation of skin of colour patients in academic sources of dermatologic diseases, including databases. This visual racism has consequently generated less comfort and confidence among the specialists in the care and attention of this ethnic group, including the opportunity of being correctly diagnosed.</p>\n </section>\n \n <section>\n \n <h3> Objectives</h3>\n \n <p>To investigate and uncover potential racial biases in the HAM10000 data set through an exploratory analysis of the dark skin tones representation, the identification of inaccuracies in its documentation, the recognition of relevant skin conditions absent for darker skin and the lack of ethnic diversity variables crucial for validating diagnosis across different skin tones.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>An exploratory examination was conducted to investigate the occurrence of dark skin within the HAM10000 database (housed in a Harvard Dataverse repository), consisting of 10,015 dermoscopic images of skin lesions. A visual depiction encompassing the whole skin tones was generated by sampling four crucial data points from each image and applying the Gray World Algorithm for colour normalization. To confirm the accuracy of the graphical representation, dermatologists validated the pixel sampling process by analysing a randomly selected 10% of the images for each type of skin lesion. This visual representation was produced for the entire data set as well as for each skin lesion type. The study was further enhanced by comparing the skin lesion representation within the HAM10000 data set against documented prevalences of relevant conditions affecting dark skin.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Less than 5% of the images came from dark-skinned patients. Nevertheless, in about 4.9% of cases, our pixel sampling method might inadvertently capture shadows or dark spots resulting from the imaging device or the lesion itself rather than the individual's actual skin tone. In addition, there are inaccuracies in the data set's claims of diversity and comprehensive coverage, notably the underrepresentation of conditions prevalent in darker skin and the absence of ethnic diversity variables.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Visual racism is an issue that needs to be addressed in medical sources of information and education. Image databases and artificial intelligence models need to be nourished with information, including all skin types, to guarantee equal access to opportunities. Furthermore, any instances where conditions affecting people of colour are underrepresented must be meticulously documented and reported to highlight and address these disparities effectively. This is particularly important in dermoscopy imaging, where solely relying on image-based racial bias analysis is limited. The alteration of the patient's actual skin tone by the dermatoscope's lighting complicates the accurate assessment of racial bias.</p>\n </section>\n </div>","PeriodicalId":94325,"journal":{"name":"JEADV clinical practice","volume":"3 3","pages":"836-843"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jvc2.477","citationCount":"0","resultStr":"{\"title\":\"An insight into racial bias in dermoscopy repositories: A HAM10000 data set analysis\",\"authors\":\"Andres Morales-Forero, Lili Rueda Jaime, Sebastian Ramiro Gil-Quiñones, Marlon Y. Barrera Montañez, Samuel Bassetto, Eric Coatanea\",\"doi\":\"10.1002/jvc2.477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Studies have revealed a lack of representation of skin of colour patients in academic sources of dermatologic diseases, including databases. This visual racism has consequently generated less comfort and confidence among the specialists in the care and attention of this ethnic group, including the opportunity of being correctly diagnosed.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Objectives</h3>\\n \\n <p>To investigate and uncover potential racial biases in the HAM10000 data set through an exploratory analysis of the dark skin tones representation, the identification of inaccuracies in its documentation, the recognition of relevant skin conditions absent for darker skin and the lack of ethnic diversity variables crucial for validating diagnosis across different skin tones.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>An exploratory examination was conducted to investigate the occurrence of dark skin within the HAM10000 database (housed in a Harvard Dataverse repository), consisting of 10,015 dermoscopic images of skin lesions. A visual depiction encompassing the whole skin tones was generated by sampling four crucial data points from each image and applying the Gray World Algorithm for colour normalization. To confirm the accuracy of the graphical representation, dermatologists validated the pixel sampling process by analysing a randomly selected 10% of the images for each type of skin lesion. This visual representation was produced for the entire data set as well as for each skin lesion type. The study was further enhanced by comparing the skin lesion representation within the HAM10000 data set against documented prevalences of relevant conditions affecting dark skin.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Less than 5% of the images came from dark-skinned patients. Nevertheless, in about 4.9% of cases, our pixel sampling method might inadvertently capture shadows or dark spots resulting from the imaging device or the lesion itself rather than the individual's actual skin tone. In addition, there are inaccuracies in the data set's claims of diversity and comprehensive coverage, notably the underrepresentation of conditions prevalent in darker skin and the absence of ethnic diversity variables.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Visual racism is an issue that needs to be addressed in medical sources of information and education. Image databases and artificial intelligence models need to be nourished with information, including all skin types, to guarantee equal access to opportunities. Furthermore, any instances where conditions affecting people of colour are underrepresented must be meticulously documented and reported to highlight and address these disparities effectively. This is particularly important in dermoscopy imaging, where solely relying on image-based racial bias analysis is limited. The alteration of the patient's actual skin tone by the dermatoscope's lighting complicates the accurate assessment of racial bias.</p>\\n </section>\\n </div>\",\"PeriodicalId\":94325,\"journal\":{\"name\":\"JEADV clinical practice\",\"volume\":\"3 3\",\"pages\":\"836-843\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jvc2.477\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JEADV clinical practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jvc2.477\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JEADV clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jvc2.477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An insight into racial bias in dermoscopy repositories: A HAM10000 data set analysis
Background
Studies have revealed a lack of representation of skin of colour patients in academic sources of dermatologic diseases, including databases. This visual racism has consequently generated less comfort and confidence among the specialists in the care and attention of this ethnic group, including the opportunity of being correctly diagnosed.
Objectives
To investigate and uncover potential racial biases in the HAM10000 data set through an exploratory analysis of the dark skin tones representation, the identification of inaccuracies in its documentation, the recognition of relevant skin conditions absent for darker skin and the lack of ethnic diversity variables crucial for validating diagnosis across different skin tones.
Methods
An exploratory examination was conducted to investigate the occurrence of dark skin within the HAM10000 database (housed in a Harvard Dataverse repository), consisting of 10,015 dermoscopic images of skin lesions. A visual depiction encompassing the whole skin tones was generated by sampling four crucial data points from each image and applying the Gray World Algorithm for colour normalization. To confirm the accuracy of the graphical representation, dermatologists validated the pixel sampling process by analysing a randomly selected 10% of the images for each type of skin lesion. This visual representation was produced for the entire data set as well as for each skin lesion type. The study was further enhanced by comparing the skin lesion representation within the HAM10000 data set against documented prevalences of relevant conditions affecting dark skin.
Results
Less than 5% of the images came from dark-skinned patients. Nevertheless, in about 4.9% of cases, our pixel sampling method might inadvertently capture shadows or dark spots resulting from the imaging device or the lesion itself rather than the individual's actual skin tone. In addition, there are inaccuracies in the data set's claims of diversity and comprehensive coverage, notably the underrepresentation of conditions prevalent in darker skin and the absence of ethnic diversity variables.
Conclusions
Visual racism is an issue that needs to be addressed in medical sources of information and education. Image databases and artificial intelligence models need to be nourished with information, including all skin types, to guarantee equal access to opportunities. Furthermore, any instances where conditions affecting people of colour are underrepresented must be meticulously documented and reported to highlight and address these disparities effectively. This is particularly important in dermoscopy imaging, where solely relying on image-based racial bias analysis is limited. The alteration of the patient's actual skin tone by the dermatoscope's lighting complicates the accurate assessment of racial bias.