Denis Krajnc, Clemens P. Spielvogel, Boglarka Ecsedi, Zsombor Ritter, H. Alizadeh, Marcus Hacker, Laszlo Papp
{"title":"Clinician-driven automated data preprocessing in nuclear medicine AI environments","authors":"Denis Krajnc, Clemens P. Spielvogel, Boglarka Ecsedi, Zsombor Ritter, H. Alizadeh, Marcus Hacker, Laszlo Papp","doi":"10.1007/s00259-025-07183-5","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Artificial Intelligence (AI) approaches in clinical science require extensive data preprocessing (DP) steps prior to building AI models. Establishing DP pipelines is a non-trivial task, mainly driven by purely mathematical rules and done by data scientists. Nevertheless, clinician presence shall be paramount at this step. The study proposes a data preprocessing approach driven by clinical domain knowledge, where clinician input, in form of explicit and non-explicit rules, directly impacts the algorithms’ decision-making processes, thus, making the DP planning phase more inclusive for clinicians.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>The rule set table (RST) was introduced as interface which accepts clinician’s input as formal rules (including four actions: exp-keep, exp-remove, pref-keep, pref-remove features or samples) in human-readable form and translates it to machine readable input for preprocessing algorithms. A collection of commonly used algorithms was incorporated for data preprocessing of various clinical cohorts in both single and multi-center scenarios. The impact of RST was evaluated by utilizing 100-fold Monte Carlo cross-validation scheme for prostate and glioma cohorts (single center) with 80 − 20% training-testing split. Furthermore, diffuse large B-cell lymphoma (DLBCL) cohort was evaluated by using Center 1 as training and Center 2 as testing cohort for clinical endpoint prediction. Both scenarios were investigated in manual and automated data preprocessing setups across all cohorts. The XGBoost algorithm was employed for classification tasks across all established models. Predictive performance was estimated by confusion matrix analysis in validation samples of all cohorts. The performance of RST across all actions as well as without RST were compared in both manual and automated settings for each respective cohort.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Performance increase of ML models with manual preprocessing combined with RST was up-to 18% balanced accuracy (BACC) compared to models without RST. The ML models with “exp-keep” and “pref-keep” instructions showed highest performance increase of + 18% BACC (glioma), + 6% BACC (prostate) and + 3% BACC (DLBCL) compared to other models across all datasets.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>The study demonstrated the added value of RST in predictive performance of oncology-specific ML models, hence, serving as proof of concept of a more inclusive clinician-driven DP process in future studies.</p>","PeriodicalId":11909,"journal":{"name":"European Journal of Nuclear Medicine and Molecular Imaging","volume":"8 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Nuclear Medicine and Molecular Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00259-025-07183-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Artificial Intelligence (AI) approaches in clinical science require extensive data preprocessing (DP) steps prior to building AI models. Establishing DP pipelines is a non-trivial task, mainly driven by purely mathematical rules and done by data scientists. Nevertheless, clinician presence shall be paramount at this step. The study proposes a data preprocessing approach driven by clinical domain knowledge, where clinician input, in form of explicit and non-explicit rules, directly impacts the algorithms’ decision-making processes, thus, making the DP planning phase more inclusive for clinicians.
Methods
The rule set table (RST) was introduced as interface which accepts clinician’s input as formal rules (including four actions: exp-keep, exp-remove, pref-keep, pref-remove features or samples) in human-readable form and translates it to machine readable input for preprocessing algorithms. A collection of commonly used algorithms was incorporated for data preprocessing of various clinical cohorts in both single and multi-center scenarios. The impact of RST was evaluated by utilizing 100-fold Monte Carlo cross-validation scheme for prostate and glioma cohorts (single center) with 80 − 20% training-testing split. Furthermore, diffuse large B-cell lymphoma (DLBCL) cohort was evaluated by using Center 1 as training and Center 2 as testing cohort for clinical endpoint prediction. Both scenarios were investigated in manual and automated data preprocessing setups across all cohorts. The XGBoost algorithm was employed for classification tasks across all established models. Predictive performance was estimated by confusion matrix analysis in validation samples of all cohorts. The performance of RST across all actions as well as without RST were compared in both manual and automated settings for each respective cohort.
Results
Performance increase of ML models with manual preprocessing combined with RST was up-to 18% balanced accuracy (BACC) compared to models without RST. The ML models with “exp-keep” and “pref-keep” instructions showed highest performance increase of + 18% BACC (glioma), + 6% BACC (prostate) and + 3% BACC (DLBCL) compared to other models across all datasets.
Conclusion
The study demonstrated the added value of RST in predictive performance of oncology-specific ML models, hence, serving as proof of concept of a more inclusive clinician-driven DP process in future studies.
期刊介绍:
The European Journal of Nuclear Medicine and Molecular Imaging serves as a platform for the exchange of clinical and scientific information within nuclear medicine and related professions. It welcomes international submissions from professionals involved in the functional, metabolic, and molecular investigation of diseases. The journal's coverage spans physics, dosimetry, radiation biology, radiochemistry, and pharmacy, providing high-quality peer review by experts in the field. Known for highly cited and downloaded articles, it ensures global visibility for research work and is part of the EJNMMI journal family.