Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery, Cristian Navarrete-Dechent
{"title":"Creating a dermatologic database for artificial intelligence, a Chilean experience, and advice from ChatGPT","authors":"Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery, Cristian Navarrete-Dechent","doi":"10.1002/jvc2.546","DOIUrl":null,"url":null,"abstract":"<p>Since artificial intelligence (AI) has widely shown applications for skin cancer diagnosis, creating comprehensive image datasets is key.<span><sup>1-4</sup></span> Availability of databases are increasing, with a low representation of higher phototypes, certain ethnic groups, and limited metadata.<span><sup>5</sup></span> Excluding specific populations perpetuates healthcare disparities in the AI era.<span><sup>6</sup></span> Due to the lack of diverse datasets, external use and validation of AI algorithms is not currently possible in our population. We started a project to create a Chilean AI database: The ‘Trawa’ database ('skin' in Mapuzungun, a native Chilean language). This study aims to describe our current dataset characteristics along with the limitations during its creation.</p><p>This was a retrospective study approved by the local Institutional Review Board (IRB). The images were collected from January 2019 to December 2020, from four dermatologists working in a Tertiary Care Academic Hospital. Clinical and dermoscopy images were obtained with variable smartphones. All included lesions are biopsy-proven. Metadata (i.e., age, sex, anatomical location, histopathological details, relevant past medical story, and phototype) was obtained from the electronic medical records. Cases were coded in a specific folder. All data was stored in a Health Insurance Portability and Accountability Act (HIPAA)-compliant web hosting.</p><p>During the study period, we collected 860 individual cases consisting of 4435 clinical and dermoscopy images (Figure 1), organized in seven categories: actinic keratosis, basal cell carcinoma, cutaneous squamous cell carcinoma, melanoma, naevus, seborrhoeic keratosis and others (angiomas, warts, etc.) (Table 1), regarding metadata 52.6% were women; the average age was 54 years; 32.8% had photodamage and 70.2% were phototype III. Most cases were located on the head and neck (50.6%); and 26.8% of the diagnosis were malignant.</p><p>Finally, we also suggest working with multidisciplinary teams composed of dermatologists and computer science professionals. Creating and improving databases will augment the performance of AI algorithms,<span><sup>9</sup></span> and for us, this is a necessary step for performing collaborative work with other countries in the region (e.g., Latin America).<span><sup>3</sup></span> Potential applications of the current database include algorithm training fine-tuned for local data as well as comparing different algorithms performance on different and diverse databases. The main limitations of our database is its relatively small size. Organising lesions requires a large team and multiple resources. Also, we have included only lesions with histopathology confirmation, biasing the database towards more 'suspicious' lesions. Using noninvasive imaging technologies such as reflectance confocal microscopy could be an alternative to include nonbiopsied benign lesions.<span><sup>10</sup></span></p><p><i>Acquisition, analysis, and interpretation of data</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. D<i>rafting and revising the article</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. <i>Final approval of the version to be published</i>: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent.</p><p>This work was funded in part by ANID—Millennium Science Initiative Programme ICN2021_004.</p><p>The authors declare no conflict of interest.</p><p>Reviewed and approved by Scientific Ethical Committee for Health Sciences of Pontificia Universidad Católica de Chile; approval #211213001.</p>","PeriodicalId":94325,"journal":{"name":"JEADV clinical practice","volume":"4 1","pages":"296-298"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jvc2.546","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JEADV clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jvc2.546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Since artificial intelligence (AI) has widely shown applications for skin cancer diagnosis, creating comprehensive image datasets is key.1-4 Availability of databases are increasing, with a low representation of higher phototypes, certain ethnic groups, and limited metadata.5 Excluding specific populations perpetuates healthcare disparities in the AI era.6 Due to the lack of diverse datasets, external use and validation of AI algorithms is not currently possible in our population. We started a project to create a Chilean AI database: The ‘Trawa’ database ('skin' in Mapuzungun, a native Chilean language). This study aims to describe our current dataset characteristics along with the limitations during its creation.
This was a retrospective study approved by the local Institutional Review Board (IRB). The images were collected from January 2019 to December 2020, from four dermatologists working in a Tertiary Care Academic Hospital. Clinical and dermoscopy images were obtained with variable smartphones. All included lesions are biopsy-proven. Metadata (i.e., age, sex, anatomical location, histopathological details, relevant past medical story, and phototype) was obtained from the electronic medical records. Cases were coded in a specific folder. All data was stored in a Health Insurance Portability and Accountability Act (HIPAA)-compliant web hosting.
During the study period, we collected 860 individual cases consisting of 4435 clinical and dermoscopy images (Figure 1), organized in seven categories: actinic keratosis, basal cell carcinoma, cutaneous squamous cell carcinoma, melanoma, naevus, seborrhoeic keratosis and others (angiomas, warts, etc.) (Table 1), regarding metadata 52.6% were women; the average age was 54 years; 32.8% had photodamage and 70.2% were phototype III. Most cases were located on the head and neck (50.6%); and 26.8% of the diagnosis were malignant.
Finally, we also suggest working with multidisciplinary teams composed of dermatologists and computer science professionals. Creating and improving databases will augment the performance of AI algorithms,9 and for us, this is a necessary step for performing collaborative work with other countries in the region (e.g., Latin America).3 Potential applications of the current database include algorithm training fine-tuned for local data as well as comparing different algorithms performance on different and diverse databases. The main limitations of our database is its relatively small size. Organising lesions requires a large team and multiple resources. Also, we have included only lesions with histopathology confirmation, biasing the database towards more 'suspicious' lesions. Using noninvasive imaging technologies such as reflectance confocal microscopy could be an alternative to include nonbiopsied benign lesions.10
Acquisition, analysis, and interpretation of data: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. Drafting and revising the article: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent. Final approval of the version to be published: Leonel Hidalgo, María Paz Salinas, Javiera Sepúlveda, Karina Carrasco, Pamela Romero, Alma Pedro, Soledad Vidaurre, Domingo Mery and Cristian Navarrete-Dechent.
This work was funded in part by ANID—Millennium Science Initiative Programme ICN2021_004.
The authors declare no conflict of interest.
Reviewed and approved by Scientific Ethical Committee for Health Sciences of Pontificia Universidad Católica de Chile; approval #211213001.