Clinician-driven automated data preprocessing in nuclear medicine AI environments

IF 7.6 1区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING European Journal of Nuclear Medicine and Molecular Imaging Pub Date : 2025-03-07 DOI:10.1007/s00259-025-07183-5

Denis Krajnc, Clemens P. Spielvogel, Boglarka Ecsedi, Zsombor Ritter, H. Alizadeh, Marcus Hacker, Laszlo Papp

{"title":"Clinician-driven automated data preprocessing in nuclear medicine AI environments","authors":"Denis Krajnc, Clemens P. Spielvogel, Boglarka Ecsedi, Zsombor Ritter, H. Alizadeh, Marcus Hacker, Laszlo Papp","doi":"10.1007/s00259-025-07183-5","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Artificial Intelligence (AI) approaches in clinical science require extensive data preprocessing (DP) steps prior to building AI models. Establishing DP pipelines is a non-trivial task, mainly driven by purely mathematical rules and done by data scientists. Nevertheless, clinician presence shall be paramount at this step. The study proposes a data preprocessing approach driven by clinical domain knowledge, where clinician input, in form of explicit and non-explicit rules, directly impacts the algorithms’ decision-making processes, thus, making the DP planning phase more inclusive for clinicians.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>The rule set table (RST) was introduced as interface which accepts clinician’s input as formal rules (including four actions: exp-keep, exp-remove, pref-keep, pref-remove features or samples) in human-readable form and translates it to machine readable input for preprocessing algorithms. A collection of commonly used algorithms was incorporated for data preprocessing of various clinical cohorts in both single and multi-center scenarios. The impact of RST was evaluated by utilizing 100-fold Monte Carlo cross-validation scheme for prostate and glioma cohorts (single center) with 80 − 20% training-testing split. Furthermore, diffuse large B-cell lymphoma (DLBCL) cohort was evaluated by using Center 1 as training and Center 2 as testing cohort for clinical endpoint prediction. Both scenarios were investigated in manual and automated data preprocessing setups across all cohorts. The XGBoost algorithm was employed for classification tasks across all established models. Predictive performance was estimated by confusion matrix analysis in validation samples of all cohorts. The performance of RST across all actions as well as without RST were compared in both manual and automated settings for each respective cohort.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Performance increase of ML models with manual preprocessing combined with RST was up-to 18% balanced accuracy (BACC) compared to models without RST. The ML models with “exp-keep” and “pref-keep” instructions showed highest performance increase of + 18% BACC (glioma), + 6% BACC (prostate) and + 3% BACC (DLBCL) compared to other models across all datasets.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>The study demonstrated the added value of RST in predictive performance of oncology-specific ML models, hence, serving as proof of concept of a more inclusive clinician-driven DP process in future studies.</p>","PeriodicalId":11909,"journal":{"name":"European Journal of Nuclear Medicine and Molecular Imaging","volume":"8 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Nuclear Medicine and Molecular Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00259-025-07183-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Artificial Intelligence (AI) approaches in clinical science require extensive data preprocessing (DP) steps prior to building AI models. Establishing DP pipelines is a non-trivial task, mainly driven by purely mathematical rules and done by data scientists. Nevertheless, clinician presence shall be paramount at this step. The study proposes a data preprocessing approach driven by clinical domain knowledge, where clinician input, in form of explicit and non-explicit rules, directly impacts the algorithms’ decision-making processes, thus, making the DP planning phase more inclusive for clinicians.

Methods

The rule set table (RST) was introduced as interface which accepts clinician’s input as formal rules (including four actions: exp-keep, exp-remove, pref-keep, pref-remove features or samples) in human-readable form and translates it to machine readable input for preprocessing algorithms. A collection of commonly used algorithms was incorporated for data preprocessing of various clinical cohorts in both single and multi-center scenarios. The impact of RST was evaluated by utilizing 100-fold Monte Carlo cross-validation scheme for prostate and glioma cohorts (single center) with 80 − 20% training-testing split. Furthermore, diffuse large B-cell lymphoma (DLBCL) cohort was evaluated by using Center 1 as training and Center 2 as testing cohort for clinical endpoint prediction. Both scenarios were investigated in manual and automated data preprocessing setups across all cohorts. The XGBoost algorithm was employed for classification tasks across all established models. Predictive performance was estimated by confusion matrix analysis in validation samples of all cohorts. The performance of RST across all actions as well as without RST were compared in both manual and automated settings for each respective cohort.

Results

Performance increase of ML models with manual preprocessing combined with RST was up-to 18% balanced accuracy (BACC) compared to models without RST. The ML models with “exp-keep” and “pref-keep” instructions showed highest performance increase of + 18% BACC (glioma), + 6% BACC (prostate) and + 3% BACC (DLBCL) compared to other models across all datasets.

Conclusion

The study demonstrated the added value of RST in predictive performance of oncology-specific ML models, hence, serving as proof of concept of a more inclusive clinician-driven DP process in future studies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

核医学人工智能环境中临床医生驱动的自动数据预处理

临床科学中的人工智能（AI）方法在构建人工智能模型之前需要大量的数据预处理（DP）步骤。建立DP管道是一项重要的任务，主要由纯粹的数学规则驱动，并由数据科学家完成。然而，在这一步骤中，临床医生的存在是至关重要的。该研究提出了一种由临床领域知识驱动的数据预处理方法，其中临床医生以显式和非显式规则的形式输入，直接影响算法的决策过程，从而使临床医生的DP规划阶段更具包容性。方法引入规则集表（rule set table， RST）作为界面，将临床医生的输入作为人可读形式的正式规则（包括exp-keep、exp-remove、pref-keep、pref-remove特征或样本四种动作），并将其转换为机器可读输入进行预处理算法。在单中心和多中心情况下，采用了一系列常用算法对各种临床队列进行数据预处理。RST的影响评估采用100倍蒙特卡罗交叉验证方案，前列腺和胶质瘤队列（单中心），80 - 20%训练-测试分割。此外，弥漫性大b细胞淋巴瘤（DLBCL）队列通过使用中心1作为训练，中心2作为测试队列来评估临床终点预测。这两种情况都在所有队列的手动和自动数据预处理设置中进行了调查。采用XGBoost算法对所有已建立的模型进行分类任务。通过混淆矩阵分析对所有队列的验证样本进行预测。在每个队列的手动和自动设置中，比较了RST在所有动作中的表现以及没有RST的表现。结果人工预处理与RST相结合的ML模型的平衡准确率（BACC）比未加RST的模型提高了18%。与所有数据集的其他模型相比，具有“exp-keep”和“pref-keep”指令的ML模型的性能提高最高，分别为+ 18% BACC（胶质瘤）、+ 6% BACC（前列腺）和+ 3% BACC （DLBCL）。结论：该研究证明了RST在肿瘤特异性ML模型预测性能方面的附加价值，因此，在未来的研究中，RST可以作为更具包容性的临床驱动DP过程概念的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

European Journal of Nuclear Medicine and Molecular Imaging 医学-核医学

CiteScore

15.60

自引率

9.90%

发文量

392

审稿时长

3 months

期刊介绍： The European Journal of Nuclear Medicine and Molecular Imaging serves as a platform for the exchange of clinical and scientific information within nuclear medicine and related professions. It welcomes international submissions from professionals involved in the functional, metabolic, and molecular investigation of diseases. The journal's coverage spans physics, dosimetry, radiation biology, radiochemistry, and pharmacy, providing high-quality peer review by experts in the field. Known for highly cited and downloaded articles, it ensures global visibility for research work and is part of the EJNMMI journal family.