{"title":"实验设计的前向逐步随机森林分析","authors":"Chang-Yun Lin","doi":"10.1080/00224065.2020.1865853","DOIUrl":null,"url":null,"abstract":"Abstract In experimental designs, it is usually assumed that the data follow normal distributions and the models have linear structures. In practice, experimenters may encounter different types of responses and be uncertain about model structures. If this is the case, traditional methods, such as the ANOVA and regression, are not suitable for data analysis and model selection. We introduce the random forest analysis, which is a powerful machine learning method capable of analyzing numerical and categorical data with complicated model structures. To perform model selection and factor identification with the random forest method, we propose a forward stepwise algorithm and develop Python and R codes based on minimizing the OOB error. Six examples including simulation and case studies are provided. We compare the performance of the proposed method and some frequently used analysis methods. Results show that the forward stepwise random forest analysis, in general, has a high power for identifying active factors and selects models that have high prediction accuracy.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"22 1","pages":"488 - 504"},"PeriodicalIF":2.6000,"publicationDate":"2021-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Forward stepwise random forest analysis for experimental designs\",\"authors\":\"Chang-Yun Lin\",\"doi\":\"10.1080/00224065.2020.1865853\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In experimental designs, it is usually assumed that the data follow normal distributions and the models have linear structures. In practice, experimenters may encounter different types of responses and be uncertain about model structures. If this is the case, traditional methods, such as the ANOVA and regression, are not suitable for data analysis and model selection. We introduce the random forest analysis, which is a powerful machine learning method capable of analyzing numerical and categorical data with complicated model structures. To perform model selection and factor identification with the random forest method, we propose a forward stepwise algorithm and develop Python and R codes based on minimizing the OOB error. Six examples including simulation and case studies are provided. We compare the performance of the proposed method and some frequently used analysis methods. Results show that the forward stepwise random forest analysis, in general, has a high power for identifying active factors and selects models that have high prediction accuracy.\",\"PeriodicalId\":54769,\"journal\":{\"name\":\"Journal of Quality Technology\",\"volume\":\"22 1\",\"pages\":\"488 - 504\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2021-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Quality Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1080/00224065.2020.1865853\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/00224065.2020.1865853","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
Forward stepwise random forest analysis for experimental designs
Abstract In experimental designs, it is usually assumed that the data follow normal distributions and the models have linear structures. In practice, experimenters may encounter different types of responses and be uncertain about model structures. If this is the case, traditional methods, such as the ANOVA and regression, are not suitable for data analysis and model selection. We introduce the random forest analysis, which is a powerful machine learning method capable of analyzing numerical and categorical data with complicated model structures. To perform model selection and factor identification with the random forest method, we propose a forward stepwise algorithm and develop Python and R codes based on minimizing the OOB error. Six examples including simulation and case studies are provided. We compare the performance of the proposed method and some frequently used analysis methods. Results show that the forward stepwise random forest analysis, in general, has a high power for identifying active factors and selects models that have high prediction accuracy.
期刊介绍:
The objective of Journal of Quality Technology is to contribute to the technical advancement of the field of quality technology by publishing papers that emphasize the practical applicability of new techniques, instructive examples of the operation of existing techniques and results of historical researches. Expository, review, and tutorial papers are also acceptable if they are written in a style suitable for practicing engineers.
Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days