{"title":"基于社会生活和幸福因素对身高指标进行分类的高维选择二元结果交互(HDSI-BO)算法","authors":"Ziqian Zhuang, Wei Xu, Rahi Jain","doi":"10.33137/utjph.v2i2.36764","DOIUrl":null,"url":null,"abstract":"Introduction: High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors. \nMethods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC. \nResults: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods. \nConclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.","PeriodicalId":265882,"journal":{"name":"University of Toronto Journal of Public Health","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) Algorithm in Classifying Height Indicators Through Social-life and Well-being Factors\",\"authors\":\"Ziqian Zhuang, Wei Xu, Rahi Jain\",\"doi\":\"10.33137/utjph.v2i2.36764\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors. \\nMethods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC. \\nResults: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods. \\nConclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.\",\"PeriodicalId\":265882,\"journal\":{\"name\":\"University of Toronto Journal of Public Health\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"University of Toronto Journal of Public Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33137/utjph.v2i2.36764\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"University of Toronto Journal of Public Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33137/utjph.v2i2.36764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
介绍:HDSI-BO (High dimensional Selection with Interactions for Binary Outcome)算法可以将交互项与现有的特征选择技术相结合。仿真研究已经验证了HDSI-BO选择真实特征的能力,因此,与标准算法相比,提高了预测精度。我们的目标是评估HDSI-BO在结合不同技术时的适用性,并在通过社会生活和幸福因素预测身高指标的实际数据研究中测量其预测性能。方法:采用logistic回归、脊回归、LASSO、自适应LASSO、弹性网等方法对HDSI-BO进行综合评价。考虑了双向相互作用条件。通过遗传算法优化HDSI-BO中使用的超参数,并进行五次交叉验证。为了衡量特征选择的性能,我们基于选择的特征集通过逻辑回归拟合最终模型,并使用模型的AUC作为度量。重复进行30次试验,以产生所选特征的数量范围和AUC的95%置信区间。结果:HDSI-BO方法与上述所有方法联合使用时,在平均值和置信区间上均获得更高的最终AUC值。此外,与标准方法相比,HDSI-BO方法有效地缩小了所选特征和交互项的集合。结论:HDSI-BO算法与多种标准方法结合良好,预测性能与标准方法相当或更好。HDSI-BO的计算复杂度和时间复杂度较高,但仍然可以接受。将AUC作为单一指标不能全面衡量特征选择性能。应该为今后的工作探索更有效的绩效指标。
High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) Algorithm in Classifying Height Indicators Through Social-life and Well-being Factors
Introduction: High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors.
Methods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC.
Results: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods.
Conclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.