乳腺癌人工智能模型开发的公共数据同质化

IF 3.7 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING European Radiology Experimental Pub Date : 2024-04-09 DOI:10.1186/s41747-024-00442-4

Vassilis Kilintzis, Varvara Kalokyri, Haridimos Kondylakis, Smriti Joshi, Katerina Nikiforaki, Oliver Díaz, Karim Lekadir, Manolis Tsiknakis, Kostas Marias

{"title":"乳腺癌人工智能模型开发的公共数据同质化","authors":"Vassilis Kilintzis, Varvara Kalokyri, Haridimos Kondylakis, Smriti Joshi, Katerina Nikiforaki, Oliver Díaz, Karim Lekadir, Manolis Tsiknakis, Kostas Marias","doi":"10.1186/s41747-024-00442-4","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Developing trustworthy artificial intelligence (AI) models for clinical applications requires access to clinical and imaging data cohorts. Reusing of publicly available datasets has the potential to fill this gap. Specifically in the domain of breast cancer, a large archive of publicly accessible medical images along with the corresponding clinical data is available at The Cancer Imaging Archive (TCIA). However, existing datasets cannot be directly used as they are heterogeneous and cannot be effectively filtered for selecting specific image types required to develop AI models. This work focuses on the development of a homogenized dataset in the domain of breast cancer including clinical and imaging data.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Five datasets were acquired from the TCIA and were harmonized. For the clinical data harmonization, a common data model was developed and a repeatable, documented “extract-transform-load” process was defined and executed for their homogenization. Further, Digital Imaging and COmmunications in Medicine (DICOM) information was extracted from magnetic resonance imaging (MRI) data and made accessible and searchable.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The resulting harmonized dataset includes information about 2,035 subjects with breast cancer. Further, a platform named RV-Cherry-Picker enables search over both the clinical and diagnostic imaging datasets, providing unified access, facilitating the downloading of all study imaging that correspond to specific series’ characteristics (<i>e.g.</i>, dynamic contrast-enhanced series), and reducing the burden of acquiring the appropriate set of images for the respective AI model scenario.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>RV-Cherry-Picker provides access to the largest, publicly available, homogenized, imaging/clinical dataset for breast cancer to develop AI models on top.</p><h3 data-test=\"abstract-sub-heading\">Relevance statement</h3><p>We present a solution for creating merged public datasets supporting AI model development, using as an example the breast cancer domain and magnetic resonance imaging images.</p><h3 data-test=\"abstract-sub-heading\">Key points</h3><p>• The proposed platform allows unified access to the largest, homogenized public imaging dataset for breast cancer.</p><p>• A methodology for the semantically enriched homogenization of public clinical data is presented.</p><p>• The platform is able to make a detailed selection of breast MRI data for the development of AI models.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>\n","PeriodicalId":36926,"journal":{"name":"European Radiology Experimental","volume":"50 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Public data homogenization for AI model development in breast cancer\",\"authors\":\"Vassilis Kilintzis, Varvara Kalokyri, Haridimos Kondylakis, Smriti Joshi, Katerina Nikiforaki, Oliver Díaz, Karim Lekadir, Manolis Tsiknakis, Kostas Marias\",\"doi\":\"10.1186/s41747-024-00442-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3 data-test=\\\"abstract-sub-heading\\\">Background</h3><p>Developing trustworthy artificial intelligence (AI) models for clinical applications requires access to clinical and imaging data cohorts. Reusing of publicly available datasets has the potential to fill this gap. Specifically in the domain of breast cancer, a large archive of publicly accessible medical images along with the corresponding clinical data is available at The Cancer Imaging Archive (TCIA). However, existing datasets cannot be directly used as they are heterogeneous and cannot be effectively filtered for selecting specific image types required to develop AI models. This work focuses on the development of a homogenized dataset in the domain of breast cancer including clinical and imaging data.</p><h3 data-test=\\\"abstract-sub-heading\\\">Methods</h3><p>Five datasets were acquired from the TCIA and were harmonized. For the clinical data harmonization, a common data model was developed and a repeatable, documented “extract-transform-load” process was defined and executed for their homogenization. Further, Digital Imaging and COmmunications in Medicine (DICOM) information was extracted from magnetic resonance imaging (MRI) data and made accessible and searchable.</p><h3 data-test=\\\"abstract-sub-heading\\\">Results</h3><p>The resulting harmonized dataset includes information about 2,035 subjects with breast cancer. Further, a platform named RV-Cherry-Picker enables search over both the clinical and diagnostic imaging datasets, providing unified access, facilitating the downloading of all study imaging that correspond to specific series’ characteristics (<i>e.g.</i>, dynamic contrast-enhanced series), and reducing the burden of acquiring the appropriate set of images for the respective AI model scenario.</p><h3 data-test=\\\"abstract-sub-heading\\\">Conclusions</h3><p>RV-Cherry-Picker provides access to the largest, publicly available, homogenized, imaging/clinical dataset for breast cancer to develop AI models on top.</p><h3 data-test=\\\"abstract-sub-heading\\\">Relevance statement</h3><p>We present a solution for creating merged public datasets supporting AI model development, using as an example the breast cancer domain and magnetic resonance imaging images.</p><h3 data-test=\\\"abstract-sub-heading\\\">Key points</h3><p>• The proposed platform allows unified access to the largest, homogenized public imaging dataset for breast cancer.</p><p>• A methodology for the semantically enriched homogenization of public clinical data is presented.</p><p>• The platform is able to make a detailed selection of breast MRI data for the development of AI models.</p><h3 data-test=\\\"abstract-sub-heading\\\">Graphical Abstract</h3>\\n\",\"PeriodicalId\":36926,\"journal\":{\"name\":\"European Radiology Experimental\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Radiology Experimental\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s41747-024-00442-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology Experimental","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41747-024-00442-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

背景为临床应用开发可信的人工智能（AI）模型需要访问临床和成像数据队列。重复使用公开可用的数据集有可能填补这一空白。具体到乳腺癌领域，癌症成像档案馆（TCIA）提供了大量可公开访问的医学影像和相应的临床数据。然而，现有的数据集无法直接使用，因为它们是异构的，无法有效地筛选出开发人工智能模型所需的特定图像类型。这项工作的重点是开发乳腺癌领域的同质化数据集，包括临床和成像数据。为统一临床数据，开发了一个通用数据模型，并定义和执行了一个可重复的、记录在案的 "提取-转换-加载 "流程，以实现数据的同质化。此外，还从磁共振成像（MRI）数据中提取了数字成像和医学通信（DICOM）信息，并使其可访问和搜索。此外，一个名为 RV-Cherry-Picker 的平台可在临床和诊断成像数据集上进行搜索，提供统一的访问，方便下载符合特定系列特征的所有研究成像（例如，动态对比增强系列）、结论RV-Cherry-Picker 提供了对最大的、公开可用的、同质化的乳腺癌成像/临床数据集的访问，以便在此基础上开发人工智能模型。相关性声明我们以乳腺癌领域和磁共振成像图像为例，介绍了一种创建支持人工智能模型开发的合并公共数据集的解决方案。要点--所提出的平台允许统一访问最大的、同质化的乳腺癌公共成像数据集--提出了一种对公共临床数据进行语义丰富的同质化的方法--该平台能够为开发人工智能模型详细选择乳腺磁共振成像数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Public data homogenization for AI model development in breast cancer

Background

Developing trustworthy artificial intelligence (AI) models for clinical applications requires access to clinical and imaging data cohorts. Reusing of publicly available datasets has the potential to fill this gap. Specifically in the domain of breast cancer, a large archive of publicly accessible medical images along with the corresponding clinical data is available at The Cancer Imaging Archive (TCIA). However, existing datasets cannot be directly used as they are heterogeneous and cannot be effectively filtered for selecting specific image types required to develop AI models. This work focuses on the development of a homogenized dataset in the domain of breast cancer including clinical and imaging data.

Methods

Five datasets were acquired from the TCIA and were harmonized. For the clinical data harmonization, a common data model was developed and a repeatable, documented “extract-transform-load” process was defined and executed for their homogenization. Further, Digital Imaging and COmmunications in Medicine (DICOM) information was extracted from magnetic resonance imaging (MRI) data and made accessible and searchable.

Results

The resulting harmonized dataset includes information about 2,035 subjects with breast cancer. Further, a platform named RV-Cherry-Picker enables search over both the clinical and diagnostic imaging datasets, providing unified access, facilitating the downloading of all study imaging that correspond to specific series’ characteristics (e.g., dynamic contrast-enhanced series), and reducing the burden of acquiring the appropriate set of images for the respective AI model scenario.

Conclusions

RV-Cherry-Picker provides access to the largest, publicly available, homogenized, imaging/clinical dataset for breast cancer to develop AI models on top.

Relevance statement

We present a solution for creating merged public datasets supporting AI model development, using as an example the breast cancer domain and magnetic resonance imaging images.

Key points

• The proposed platform allows unified access to the largest, homogenized public imaging dataset for breast cancer.

• A methodology for the semantically enriched homogenization of public clinical data is presented.

• The platform is able to make a detailed selection of breast MRI data for the development of AI models.

Graphical Abstract

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Radiology Experimental Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

6.70

自引率

2.60%

发文量

审稿时长

18 weeks