M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon
{"title":"Metabolic Profiling of 1H NMR Spectra in Chronic Kidney Disease with Local Predictive Modeling","authors":"M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon","doi":"10.1109/ICMLA.2015.155","DOIUrl":null,"url":null,"abstract":"Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"1 1","pages":"176-181"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.