Clinician-driven automated data preprocessing in nuclear medicine AI environments

IF 7.6 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING European Journal of Nuclear Medicine and Molecular Imaging Pub Date : 2025-03-07 DOI:10.1007/s00259-025-07183-5
Denis Krajnc, Clemens P. Spielvogel, Boglarka Ecsedi, Zsombor Ritter, H. Alizadeh, Marcus Hacker, Laszlo Papp
{"title":"Clinician-driven automated data preprocessing in nuclear medicine AI environments","authors":"Denis Krajnc, Clemens P. Spielvogel, Boglarka Ecsedi, Zsombor Ritter, H. Alizadeh, Marcus Hacker, Laszlo Papp","doi":"10.1007/s00259-025-07183-5","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Artificial Intelligence (AI) approaches in clinical science require extensive data preprocessing (DP) steps prior to building AI models. Establishing DP pipelines is a non-trivial task, mainly driven by purely mathematical rules and done by data scientists. Nevertheless, clinician presence shall be paramount at this step. The study proposes a data preprocessing approach driven by clinical domain knowledge, where clinician input, in form of explicit and non-explicit rules, directly impacts the algorithms’ decision-making processes, thus, making the DP planning phase more inclusive for clinicians.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>The rule set table (RST) was introduced as interface which accepts clinician’s input as formal rules (including four actions: exp-keep, exp-remove, pref-keep, pref-remove features or samples) in human-readable form and translates it to machine readable input for preprocessing algorithms. A collection of commonly used algorithms was incorporated for data preprocessing of various clinical cohorts in both single and multi-center scenarios. The impact of RST was evaluated by utilizing 100-fold Monte Carlo cross-validation scheme for prostate and glioma cohorts (single center) with 80 − 20% training-testing split. Furthermore, diffuse large B-cell lymphoma (DLBCL) cohort was evaluated by using Center 1 as training and Center 2 as testing cohort for clinical endpoint prediction. Both scenarios were investigated in manual and automated data preprocessing setups across all cohorts. The XGBoost algorithm was employed for classification tasks across all established models. Predictive performance was estimated by confusion matrix analysis in validation samples of all cohorts. The performance of RST across all actions as well as without RST were compared in both manual and automated settings for each respective cohort.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Performance increase of ML models with manual preprocessing combined with RST was up-to 18% balanced accuracy (BACC) compared to models without RST. The ML models with “exp-keep” and “pref-keep” instructions showed highest performance increase of + 18% BACC (glioma), + 6% BACC (prostate) and + 3% BACC (DLBCL) compared to other models across all datasets.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>The study demonstrated the added value of RST in predictive performance of oncology-specific ML models, hence, serving as proof of concept of a more inclusive clinician-driven DP process in future studies.</p>","PeriodicalId":11909,"journal":{"name":"European Journal of Nuclear Medicine and Molecular Imaging","volume":"8 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Nuclear Medicine and Molecular Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00259-025-07183-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Artificial Intelligence (AI) approaches in clinical science require extensive data preprocessing (DP) steps prior to building AI models. Establishing DP pipelines is a non-trivial task, mainly driven by purely mathematical rules and done by data scientists. Nevertheless, clinician presence shall be paramount at this step. The study proposes a data preprocessing approach driven by clinical domain knowledge, where clinician input, in form of explicit and non-explicit rules, directly impacts the algorithms’ decision-making processes, thus, making the DP planning phase more inclusive for clinicians.

Methods

The rule set table (RST) was introduced as interface which accepts clinician’s input as formal rules (including four actions: exp-keep, exp-remove, pref-keep, pref-remove features or samples) in human-readable form and translates it to machine readable input for preprocessing algorithms. A collection of commonly used algorithms was incorporated for data preprocessing of various clinical cohorts in both single and multi-center scenarios. The impact of RST was evaluated by utilizing 100-fold Monte Carlo cross-validation scheme for prostate and glioma cohorts (single center) with 80 − 20% training-testing split. Furthermore, diffuse large B-cell lymphoma (DLBCL) cohort was evaluated by using Center 1 as training and Center 2 as testing cohort for clinical endpoint prediction. Both scenarios were investigated in manual and automated data preprocessing setups across all cohorts. The XGBoost algorithm was employed for classification tasks across all established models. Predictive performance was estimated by confusion matrix analysis in validation samples of all cohorts. The performance of RST across all actions as well as without RST were compared in both manual and automated settings for each respective cohort.

Results

Performance increase of ML models with manual preprocessing combined with RST was up-to 18% balanced accuracy (BACC) compared to models without RST. The ML models with “exp-keep” and “pref-keep” instructions showed highest performance increase of + 18% BACC (glioma), + 6% BACC (prostate) and + 3% BACC (DLBCL) compared to other models across all datasets.

Conclusion

The study demonstrated the added value of RST in predictive performance of oncology-specific ML models, hence, serving as proof of concept of a more inclusive clinician-driven DP process in future studies.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
核医学人工智能环境中临床医生驱动的自动数据预处理
临床科学中的人工智能(AI)方法在构建人工智能模型之前需要大量的数据预处理(DP)步骤。建立DP管道是一项重要的任务,主要由纯粹的数学规则驱动,并由数据科学家完成。然而,在这一步骤中,临床医生的存在是至关重要的。该研究提出了一种由临床领域知识驱动的数据预处理方法,其中临床医生以显式和非显式规则的形式输入,直接影响算法的决策过程,从而使临床医生的DP规划阶段更具包容性。方法引入规则集表(rule set table, RST)作为界面,将临床医生的输入作为人可读形式的正式规则(包括exp-keep、exp-remove、pref-keep、pref-remove特征或样本四种动作),并将其转换为机器可读输入进行预处理算法。在单中心和多中心情况下,采用了一系列常用算法对各种临床队列进行数据预处理。RST的影响评估采用100倍蒙特卡罗交叉验证方案,前列腺和胶质瘤队列(单中心),80 - 20%训练-测试分割。此外,弥漫性大b细胞淋巴瘤(DLBCL)队列通过使用中心1作为训练,中心2作为测试队列来评估临床终点预测。这两种情况都在所有队列的手动和自动数据预处理设置中进行了调查。采用XGBoost算法对所有已建立的模型进行分类任务。通过混淆矩阵分析对所有队列的验证样本进行预测。在每个队列的手动和自动设置中,比较了RST在所有动作中的表现以及没有RST的表现。结果人工预处理与RST相结合的ML模型的平衡准确率(BACC)比未加RST的模型提高了18%。与所有数据集的其他模型相比,具有“exp-keep”和“pref-keep”指令的ML模型的性能提高最高,分别为+ 18% BACC(胶质瘤)、+ 6% BACC(前列腺)和+ 3% BACC (DLBCL)。结论:该研究证明了RST在肿瘤特异性ML模型预测性能方面的附加价值,因此,在未来的研究中,RST可以作为更具包容性的临床驱动DP过程概念的证据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
15.60
自引率
9.90%
发文量
392
审稿时长
3 months
期刊介绍: The European Journal of Nuclear Medicine and Molecular Imaging serves as a platform for the exchange of clinical and scientific information within nuclear medicine and related professions. It welcomes international submissions from professionals involved in the functional, metabolic, and molecular investigation of diseases. The journal's coverage spans physics, dosimetry, radiation biology, radiochemistry, and pharmacy, providing high-quality peer review by experts in the field. Known for highly cited and downloaded articles, it ensures global visibility for research work and is part of the EJNMMI journal family.
期刊最新文献
Clinical value of per-treatment whole-body single-time-point automated dosimetry for [177Lu]Lu-PSMA-617 therapy in metastatic castration-resistant prostate cancer. The [18F]FDG PET/CT for prognostic stratification in multiple myeloma: A systematic review and meta-analysis. PET/CT predict pathological response to neoadjuvant nivolumab in resectable non-small cell lung cancer. Regional cerebral hypometabolism and pathological heterogeneity in sporadic early onset alzheimer's disease: China Aging and Neurodegenerative Initiative (CANDI) study. The 68Ga-siderophore approach to infection imaging: evaluation of [68Ga]Ga-DFO in patients with vascular graft infection.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1