{"title":"降维方法识别心脏健康饮食模式的比较","authors":"Natalie C. Gasca, R. McClelland","doi":"10.1353/obs.2023.0020","DOIUrl":null,"url":null,"abstract":"Abstract:Most nutritional epidemiology studies investigating diet-disease trends use unsupervised dimension reduction methods, like principal component regression (PCR) and sparse PCR (SPCR), to create dietary patterns. Supervised methods, such as partial least squares (PLS), sparse PLS (SPLS), and Lasso, offer the possibility of more concisely summarizing the foods most related to a disease. In this study we evaluate these five methods for interpretable reduction of food frequency questionnaire (FFQ) data when analyzing a univariate continuous cardiac-related outcome via a simulation study and data application. We also demonstrate that to control for covariates, various scientific premises require different adjustment approaches when using PLS. To emulate food groups, we generated blocks of normally distributed predictors with varying intra-block covariances; only nine of 24 predictors contributed to the normal response. When block covariances were informed by FFQ data, the only methods that performed variable selection were Lasso and SPLS, which selected two and four irrelevant variables, respectively. SPLS had the lowest prediction error, and both PLS-based methods constructed four patterns, while PCR and SPCR created 24 patterns. These methods were applied to 120 FFQ variables and baseline body mass index (BMI) from the Multi-Ethnic Study of Atherosclerosis, which includes 6814 participants aged 45-84, and we adjusted for age, gender, race/ethnicity, exercise, and total energy intake. From 120 variables, PCR created 17 BMI-related patterns and PLS selected one pattern; SPLS only used five variables to create two patterns. All methods exhibited similar predictive performance. Specifically, SPLS’s first pattern highlighted hamburger and diet soda intake (positive associations with BMI), reflecting a fast food diet. By selecting fewer patterns and foods, SPLS can create interpretable dietary patterns while maintaining predictive ability.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of dimension reduction methods for the identification of heart-healthy dietary patterns\",\"authors\":\"Natalie C. Gasca, R. McClelland\",\"doi\":\"10.1353/obs.2023.0020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract:Most nutritional epidemiology studies investigating diet-disease trends use unsupervised dimension reduction methods, like principal component regression (PCR) and sparse PCR (SPCR), to create dietary patterns. Supervised methods, such as partial least squares (PLS), sparse PLS (SPLS), and Lasso, offer the possibility of more concisely summarizing the foods most related to a disease. In this study we evaluate these five methods for interpretable reduction of food frequency questionnaire (FFQ) data when analyzing a univariate continuous cardiac-related outcome via a simulation study and data application. We also demonstrate that to control for covariates, various scientific premises require different adjustment approaches when using PLS. To emulate food groups, we generated blocks of normally distributed predictors with varying intra-block covariances; only nine of 24 predictors contributed to the normal response. When block covariances were informed by FFQ data, the only methods that performed variable selection were Lasso and SPLS, which selected two and four irrelevant variables, respectively. SPLS had the lowest prediction error, and both PLS-based methods constructed four patterns, while PCR and SPCR created 24 patterns. These methods were applied to 120 FFQ variables and baseline body mass index (BMI) from the Multi-Ethnic Study of Atherosclerosis, which includes 6814 participants aged 45-84, and we adjusted for age, gender, race/ethnicity, exercise, and total energy intake. From 120 variables, PCR created 17 BMI-related patterns and PLS selected one pattern; SPLS only used five variables to create two patterns. All methods exhibited similar predictive performance. Specifically, SPLS’s first pattern highlighted hamburger and diet soda intake (positive associations with BMI), reflecting a fast food diet. By selecting fewer patterns and foods, SPLS can create interpretable dietary patterns while maintaining predictive ability.\",\"PeriodicalId\":74335,\"journal\":{\"name\":\"Observational studies\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Observational studies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1353/obs.2023.0020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Observational studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/obs.2023.0020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of dimension reduction methods for the identification of heart-healthy dietary patterns
Abstract:Most nutritional epidemiology studies investigating diet-disease trends use unsupervised dimension reduction methods, like principal component regression (PCR) and sparse PCR (SPCR), to create dietary patterns. Supervised methods, such as partial least squares (PLS), sparse PLS (SPLS), and Lasso, offer the possibility of more concisely summarizing the foods most related to a disease. In this study we evaluate these five methods for interpretable reduction of food frequency questionnaire (FFQ) data when analyzing a univariate continuous cardiac-related outcome via a simulation study and data application. We also demonstrate that to control for covariates, various scientific premises require different adjustment approaches when using PLS. To emulate food groups, we generated blocks of normally distributed predictors with varying intra-block covariances; only nine of 24 predictors contributed to the normal response. When block covariances were informed by FFQ data, the only methods that performed variable selection were Lasso and SPLS, which selected two and four irrelevant variables, respectively. SPLS had the lowest prediction error, and both PLS-based methods constructed four patterns, while PCR and SPCR created 24 patterns. These methods were applied to 120 FFQ variables and baseline body mass index (BMI) from the Multi-Ethnic Study of Atherosclerosis, which includes 6814 participants aged 45-84, and we adjusted for age, gender, race/ethnicity, exercise, and total energy intake. From 120 variables, PCR created 17 BMI-related patterns and PLS selected one pattern; SPLS only used five variables to create two patterns. All methods exhibited similar predictive performance. Specifically, SPLS’s first pattern highlighted hamburger and diet soda intake (positive associations with BMI), reflecting a fast food diet. By selecting fewer patterns and foods, SPLS can create interpretable dietary patterns while maintaining predictive ability.