Wen Zhu, Chenyi Chen, Lili Zhang, Tammy Hoyt, Elizabeth Walker, Shruthi Venkatesh, Fujun Zhang, Ferhan Qureshi, John F Foley, Zongqi Xia
{"title":"多发性硬化症患者血清多蛋白生物标志物与实际残疾的关系","authors":"Wen Zhu, Chenyi Chen, Lili Zhang, Tammy Hoyt, Elizabeth Walker, Shruthi Venkatesh, Fujun Zhang, Ferhan Qureshi, John F Foley, Zongqi Xia","doi":"10.1093/braincomms/fcad300","DOIUrl":null,"url":null,"abstract":"Abstract Few studies examined blood biomarkers informative of patient-reported outcome (PRO) of disability in people with multiple sclerosis (MS). We examined the associations between serum multi-protein biomarker profiles and patient-reported disability. In this cross-sectional study (2017-2020), adults with diagnosis of MS (or precursors) from two independent clinic-based cohorts were divided into a training and test set. For predictors, we examined 7 clinical factors (age at sample collection, sex, race/ethnicity, disease subtype, disease duration, disease-modifying therapy [DMT], and time interval between sample collection and closest PRO assessment) and 19 serum protein biomarkers potentially associated with MS disease activity endpoints identified from prior studies. We trained machine learning (ML) models (Least Absolute Shrinkage and Selection Operator [LASSO] regression, Random Forest, Extreme Gradient Boosting, Support-Vector Machines, stacking ensemble learning, and stacking classification) for predicting Patient Determined Disease Steps (PDDS) score as the primary endpoint and reported model performance using the held-out testing set. The study included 431 participants (mean age 49 years, 81% women, 94% non-Hispanic White). For binary PDDS score, combined feature input of routine clinical factors and the 19 proteins consistently outperformed base models (comprising clinical features alone or clinical features plus one single protein at a time) in predicting severe (PDDS ≥ 4) versus mild/moderate (PDDS < 4) disability across multiple ML approaches, with LASSO achieving the best area under the curve (AUCPDDS = 0.91) and other metrics. For ordinal PDDS score, LASSO models comprising combined clinical factors and 19 proteins as feature input (R2PDDS = 0.31) again outperformed base models. The two best-performing LASSO models (i.e., binary and ordinal PDDS) shared 6 clinical features (age, sex, race/ethnicity, disease subtype, disease duration, DMT efficacy) and 9 proteins (cluster of differentiation 6, CUB-domain-containing protein 1, contactin-2, interleukin-12 subunit-beta, neurofilament light chain [NfL], protogenin, serpin family A member 9, tumor necrosis factor superfamily member 13B, versican). By comparison, LASSO models with clinical features plus one single protein at a time as feature input did not select either NfL or glial fibrillary acidic protein (GFAP) as a final feature. Forcing either NfL or GFAP as a single protein feature into models did not improve performance beyond clinical features alone. Stacking classification model using 5 functional pathways to represent multiple proteins as meta-features implicated those involved in neuroaxonal integrity as significant contributors to predictive performance. Thus, serum multi-protein biomarker profiles improve the prediction of real-world MS disability status beyond clinical profile alone or clinical profile plus single protein biomarker, reaching clinically actionable performance.","PeriodicalId":9318,"journal":{"name":"Brain Communications","volume":"164 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Association between serum multi-protein biomarker profile and real-world disability in multiple sclerosis\",\"authors\":\"Wen Zhu, Chenyi Chen, Lili Zhang, Tammy Hoyt, Elizabeth Walker, Shruthi Venkatesh, Fujun Zhang, Ferhan Qureshi, John F Foley, Zongqi Xia\",\"doi\":\"10.1093/braincomms/fcad300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Few studies examined blood biomarkers informative of patient-reported outcome (PRO) of disability in people with multiple sclerosis (MS). We examined the associations between serum multi-protein biomarker profiles and patient-reported disability. In this cross-sectional study (2017-2020), adults with diagnosis of MS (or precursors) from two independent clinic-based cohorts were divided into a training and test set. For predictors, we examined 7 clinical factors (age at sample collection, sex, race/ethnicity, disease subtype, disease duration, disease-modifying therapy [DMT], and time interval between sample collection and closest PRO assessment) and 19 serum protein biomarkers potentially associated with MS disease activity endpoints identified from prior studies. We trained machine learning (ML) models (Least Absolute Shrinkage and Selection Operator [LASSO] regression, Random Forest, Extreme Gradient Boosting, Support-Vector Machines, stacking ensemble learning, and stacking classification) for predicting Patient Determined Disease Steps (PDDS) score as the primary endpoint and reported model performance using the held-out testing set. The study included 431 participants (mean age 49 years, 81% women, 94% non-Hispanic White). For binary PDDS score, combined feature input of routine clinical factors and the 19 proteins consistently outperformed base models (comprising clinical features alone or clinical features plus one single protein at a time) in predicting severe (PDDS ≥ 4) versus mild/moderate (PDDS < 4) disability across multiple ML approaches, with LASSO achieving the best area under the curve (AUCPDDS = 0.91) and other metrics. For ordinal PDDS score, LASSO models comprising combined clinical factors and 19 proteins as feature input (R2PDDS = 0.31) again outperformed base models. The two best-performing LASSO models (i.e., binary and ordinal PDDS) shared 6 clinical features (age, sex, race/ethnicity, disease subtype, disease duration, DMT efficacy) and 9 proteins (cluster of differentiation 6, CUB-domain-containing protein 1, contactin-2, interleukin-12 subunit-beta, neurofilament light chain [NfL], protogenin, serpin family A member 9, tumor necrosis factor superfamily member 13B, versican). By comparison, LASSO models with clinical features plus one single protein at a time as feature input did not select either NfL or glial fibrillary acidic protein (GFAP) as a final feature. Forcing either NfL or GFAP as a single protein feature into models did not improve performance beyond clinical features alone. Stacking classification model using 5 functional pathways to represent multiple proteins as meta-features implicated those involved in neuroaxonal integrity as significant contributors to predictive performance. Thus, serum multi-protein biomarker profiles improve the prediction of real-world MS disability status beyond clinical profile alone or clinical profile plus single protein biomarker, reaching clinically actionable performance.\",\"PeriodicalId\":9318,\"journal\":{\"name\":\"Brain Communications\",\"volume\":\"164 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Brain Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/braincomms/fcad300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/braincomms/fcad300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Association between serum multi-protein biomarker profile and real-world disability in multiple sclerosis
Abstract Few studies examined blood biomarkers informative of patient-reported outcome (PRO) of disability in people with multiple sclerosis (MS). We examined the associations between serum multi-protein biomarker profiles and patient-reported disability. In this cross-sectional study (2017-2020), adults with diagnosis of MS (or precursors) from two independent clinic-based cohorts were divided into a training and test set. For predictors, we examined 7 clinical factors (age at sample collection, sex, race/ethnicity, disease subtype, disease duration, disease-modifying therapy [DMT], and time interval between sample collection and closest PRO assessment) and 19 serum protein biomarkers potentially associated with MS disease activity endpoints identified from prior studies. We trained machine learning (ML) models (Least Absolute Shrinkage and Selection Operator [LASSO] regression, Random Forest, Extreme Gradient Boosting, Support-Vector Machines, stacking ensemble learning, and stacking classification) for predicting Patient Determined Disease Steps (PDDS) score as the primary endpoint and reported model performance using the held-out testing set. The study included 431 participants (mean age 49 years, 81% women, 94% non-Hispanic White). For binary PDDS score, combined feature input of routine clinical factors and the 19 proteins consistently outperformed base models (comprising clinical features alone or clinical features plus one single protein at a time) in predicting severe (PDDS ≥ 4) versus mild/moderate (PDDS < 4) disability across multiple ML approaches, with LASSO achieving the best area under the curve (AUCPDDS = 0.91) and other metrics. For ordinal PDDS score, LASSO models comprising combined clinical factors and 19 proteins as feature input (R2PDDS = 0.31) again outperformed base models. The two best-performing LASSO models (i.e., binary and ordinal PDDS) shared 6 clinical features (age, sex, race/ethnicity, disease subtype, disease duration, DMT efficacy) and 9 proteins (cluster of differentiation 6, CUB-domain-containing protein 1, contactin-2, interleukin-12 subunit-beta, neurofilament light chain [NfL], protogenin, serpin family A member 9, tumor necrosis factor superfamily member 13B, versican). By comparison, LASSO models with clinical features plus one single protein at a time as feature input did not select either NfL or glial fibrillary acidic protein (GFAP) as a final feature. Forcing either NfL or GFAP as a single protein feature into models did not improve performance beyond clinical features alone. Stacking classification model using 5 functional pathways to represent multiple proteins as meta-features implicated those involved in neuroaxonal integrity as significant contributors to predictive performance. Thus, serum multi-protein biomarker profiles improve the prediction of real-world MS disability status beyond clinical profile alone or clinical profile plus single protein biomarker, reaching clinically actionable performance.