{"title":"The impact of high-order features on performance of radiomics studies in CT non-small cell lung cancer","authors":"","doi":"10.1016/j.clinimag.2024.110244","DOIUrl":null,"url":null,"abstract":"<div><p>High-order radiomic features have been shown to produce high performance models in a variety of scenarios. However, models trained without high-order features have shown similar performance, raising the question of whether high-order features are worth including given their increased computational burden. This comparative study investigates the impact of high-order features on model performance in CT-based Non-Small Cell Lung Cancer (NSCLC) and the potential uncertainty regarding their application in machine learning. Three categories of features were retrospectively retrieved from CT images of 347 NSCLC patients: first- and second-order statistical features, morphological features and transform (high-order) features. From these, three datasets were constructed: a “low-order” dataset (Lo) which included the first-order, second-order, and morphological features, a high-order dataset (Hi), and a combined dataset (Combo). A diverse selection of datasets, feature selection methods, and predictive models were included for the uncertainty analysis, with two-year survival as the study endpoint. AUC values were calculated for comparisons and Kruskal-Wallis testing was performed to determine significant differences. The Hi (AUC: 0.41–0.62) and Combo (AUC: 0.41–0.62) datasets generate significantly (<em>P</em> < 0.01) higher model performance than the Lo dataset (AUC: 0.42–0.58). High-order features are selected more often than low-order features for model training, comprising 87 % of selected features in the Combo dataset. High-order features are a source of data that can improve machine learning model performance. However, its impact strongly depends on various factors that may lead to inconsistent results. A clear approach to incorporate high-order features in radiomic studies requires further investigation.</p></div>","PeriodicalId":50680,"journal":{"name":"Clinical Imaging","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Imaging","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0899707124001748","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
High-order radiomic features have been shown to produce high performance models in a variety of scenarios. However, models trained without high-order features have shown similar performance, raising the question of whether high-order features are worth including given their increased computational burden. This comparative study investigates the impact of high-order features on model performance in CT-based Non-Small Cell Lung Cancer (NSCLC) and the potential uncertainty regarding their application in machine learning. Three categories of features were retrospectively retrieved from CT images of 347 NSCLC patients: first- and second-order statistical features, morphological features and transform (high-order) features. From these, three datasets were constructed: a “low-order” dataset (Lo) which included the first-order, second-order, and morphological features, a high-order dataset (Hi), and a combined dataset (Combo). A diverse selection of datasets, feature selection methods, and predictive models were included for the uncertainty analysis, with two-year survival as the study endpoint. AUC values were calculated for comparisons and Kruskal-Wallis testing was performed to determine significant differences. The Hi (AUC: 0.41–0.62) and Combo (AUC: 0.41–0.62) datasets generate significantly (P < 0.01) higher model performance than the Lo dataset (AUC: 0.42–0.58). High-order features are selected more often than low-order features for model training, comprising 87 % of selected features in the Combo dataset. High-order features are a source of data that can improve machine learning model performance. However, its impact strongly depends on various factors that may lead to inconsistent results. A clear approach to incorporate high-order features in radiomic studies requires further investigation.
期刊介绍:
The mission of Clinical Imaging is to publish, in a timely manner, the very best radiology research from the United States and around the world with special attention to the impact of medical imaging on patient care. The journal''s publications cover all imaging modalities, radiology issues related to patients, policy and practice improvements, and clinically-oriented imaging physics and informatics. The journal is a valuable resource for practicing radiologists, radiologists-in-training and other clinicians with an interest in imaging. Papers are carefully peer-reviewed and selected by our experienced subject editors who are leading experts spanning the range of imaging sub-specialties, which include:
-Body Imaging-
Breast Imaging-
Cardiothoracic Imaging-
Imaging Physics and Informatics-
Molecular Imaging and Nuclear Medicine-
Musculoskeletal and Emergency Imaging-
Neuroradiology-
Practice, Policy & Education-
Pediatric Imaging-
Vascular and Interventional Radiology