Yajie Su, Yaqiong Wang, Jinfeng He, Huijun Wang, Xian A, Haili Jiang, Wei Lu, Wenhao Zhou, Long Li
{"title":"Development and validation of machine-learning models of diet management for hyperphenylalaninemia: a multicenter retrospective study","authors":"Yajie Su, Yaqiong Wang, Jinfeng He, Huijun Wang, Xian A, Haili Jiang, Wei Lu, Wenhao Zhou, Long Li","doi":"10.1186/s12916-024-03602-w","DOIUrl":null,"url":null,"abstract":"Assessing dietary phenylalanine (Phe) tolerance is crucial for managing hyperphenylalaninemia (HPA) in children. However, traditionally, adjusting the diet requires significant time from clinicians and parents. This study aims to investigate the development of a machine-learning model that predicts a range of dietary Phe intake tolerance for children with HPA over 10 years following diagnosis. In this multicenter retrospective observational study, we collected the genotypes of phenylalanine hydroxylase (PAH), metabolic profiles at screening and diagnosis, and blood Phe concentrations corresponding to dietary Phe intake from over 10 years of follow-up data for 204 children with HPA. To incorporate genetic information, allelic phenotype value (APV) was input for 2965 missense variants in the PAH gene using a predicted APV (pAPV) model. This model was trained on known pheno-genotype relationships from the BioPKU database, utilizing 31 features. Subsequently, a multiclass classification model was constructed and trained on a dataset featuring metabolic data, genetic data, and follow-up data from 3177 events. The final model was fine-tuned using tenfold validation and validated against three independent datasets. The pAPV model achieved a good predictive performance with root mean squared error (RMSE) of 1.53 and 2.38 on the training and test datasets, respectively. The variants that cause amino acid changes in the region of 200–300 of PAH tend to exhibit lower pAPV. The final model achieved a sensitivity range of 0.77 to 0.91 and a specificity range of 0.8 to 1 across all validation datasets. Additional assessment metrics including positive predictive value (0.68–1), negative predictive values (0.8–0.98), F1 score (0.71–0.92), and balanced accuracy (0.8–0.92) demonstrated the robust performance of our model. Our model integrates metabolic and genetic information to accurately predict age-specific Phe tolerance, aiding in the precision management of patients with HPA. This study provides a potential framework that could be applied to other inborn errors of metabolism.","PeriodicalId":9188,"journal":{"name":"BMC Medicine","volume":null,"pages":null},"PeriodicalIF":7.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12916-024-03602-w","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Assessing dietary phenylalanine (Phe) tolerance is crucial for managing hyperphenylalaninemia (HPA) in children. However, traditionally, adjusting the diet requires significant time from clinicians and parents. This study aims to investigate the development of a machine-learning model that predicts a range of dietary Phe intake tolerance for children with HPA over 10 years following diagnosis. In this multicenter retrospective observational study, we collected the genotypes of phenylalanine hydroxylase (PAH), metabolic profiles at screening and diagnosis, and blood Phe concentrations corresponding to dietary Phe intake from over 10 years of follow-up data for 204 children with HPA. To incorporate genetic information, allelic phenotype value (APV) was input for 2965 missense variants in the PAH gene using a predicted APV (pAPV) model. This model was trained on known pheno-genotype relationships from the BioPKU database, utilizing 31 features. Subsequently, a multiclass classification model was constructed and trained on a dataset featuring metabolic data, genetic data, and follow-up data from 3177 events. The final model was fine-tuned using tenfold validation and validated against three independent datasets. The pAPV model achieved a good predictive performance with root mean squared error (RMSE) of 1.53 and 2.38 on the training and test datasets, respectively. The variants that cause amino acid changes in the region of 200–300 of PAH tend to exhibit lower pAPV. The final model achieved a sensitivity range of 0.77 to 0.91 and a specificity range of 0.8 to 1 across all validation datasets. Additional assessment metrics including positive predictive value (0.68–1), negative predictive values (0.8–0.98), F1 score (0.71–0.92), and balanced accuracy (0.8–0.92) demonstrated the robust performance of our model. Our model integrates metabolic and genetic information to accurately predict age-specific Phe tolerance, aiding in the precision management of patients with HPA. This study provides a potential framework that could be applied to other inborn errors of metabolism.
期刊介绍:
BMC Medicine is an open access, transparent peer-reviewed general medical journal. It is the flagship journal of the BMC series and publishes outstanding and influential research in various areas including clinical practice, translational medicine, medical and health advances, public health, global health, policy, and general topics of interest to the biomedical and sociomedical professional communities. In addition to research articles, the journal also publishes stimulating debates, reviews, unique forum articles, and concise tutorials. All articles published in BMC Medicine are included in various databases such as Biological Abstracts, BIOSIS, CAS, Citebase, Current contents, DOAJ, Embase, MEDLINE, PubMed, Science Citation Index Expanded, OAIster, SCImago, Scopus, SOCOLAR, and Zetoc.