{"title":"A Gaussian process embedded feature selection method based on automatic relevance determination","authors":"Yushi Deng, Mario Eden, Selen Cremaschi","doi":"10.1016/j.compchemeng.2024.108852","DOIUrl":null,"url":null,"abstract":"<div><p>In Gaussian Process, feature importance is inversely proportional to the corresponding length scale when applying the Automatic Relevance Determination (ARD) structured kernel function. Features can be selected by ranking them according to their importance. Among the ARD-based feature selection methods, no uniform score exists for quantifying the output variation explained by feature subsets. This study proposes two feature selection approaches using two cumulative feature importance scores, one titled derivative decomposition ratio and the other normalized sensitivity, to determine the optimal feature subset. The performance of the approaches is assessed to test if irrelevant features are accurately identified and if the feature rankings are correct. The approaches are applied to identify relevant dimensionless inputs for a hybrid model estimating liquid entrainment fraction in two-phase flow. The results reveal that the proposed methods can identify the optimal feature subset for the hybrid model without significantly worsening its Root Mean Squared Error.</p></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"191 ","pages":"Article 108852"},"PeriodicalIF":3.9000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424002709","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In Gaussian Process, feature importance is inversely proportional to the corresponding length scale when applying the Automatic Relevance Determination (ARD) structured kernel function. Features can be selected by ranking them according to their importance. Among the ARD-based feature selection methods, no uniform score exists for quantifying the output variation explained by feature subsets. This study proposes two feature selection approaches using two cumulative feature importance scores, one titled derivative decomposition ratio and the other normalized sensitivity, to determine the optimal feature subset. The performance of the approaches is assessed to test if irrelevant features are accurately identified and if the feature rankings are correct. The approaches are applied to identify relevant dimensionless inputs for a hybrid model estimating liquid entrainment fraction in two-phase flow. The results reveal that the proposed methods can identify the optimal feature subset for the hybrid model without significantly worsening its Root Mean Squared Error.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.