多混合数据协变量随机分裂随机森林

Mohammad Fayaz, Alireza Abadi, Soheila Khodakarim
{"title":"多混合数据协变量随机分裂随机森林","authors":"Mohammad Fayaz, Alireza Abadi, Soheila Khodakarim","doi":"10.18502/jbe.v9i1.13974","DOIUrl":null,"url":null,"abstract":"Introduction:The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on the classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc.
 Methods: This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate.
 Results: The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates.
 Conclusion: We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with developed R package (“RSRF”) in the GitHub.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Random-Splitting Random Forest with Multiple Mixed-Data Covariates\",\"authors\":\"Mohammad Fayaz, Alireza Abadi, Soheila Khodakarim\",\"doi\":\"10.18502/jbe.v9i1.13974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction:The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on the classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc.
 Methods: This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate.
 Results: The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates.
 Conclusion: We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with developed R package (“RSRF”) in the GitHub.\",\"PeriodicalId\":34310,\"journal\":{\"name\":\"Journal of Biostatistics and Epidemiology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biostatistics and Epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18502/jbe.v9i1.13974\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biostatistics and Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18502/jbe.v9i1.13974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

bagging (BG)和random forest (RF)是著名的基于分类树和回归树的监督统计学习方法。BG和RF可以处理不同类型的响应,如分类响应、连续响应等。在许多统计应用中,有曲线、时间序列、函数数据或观测值,它们基于各自的域而相互关联。在许多文献中,RF方法被扩展到功能数据作为协变量或响应的某些情况。其中,随机分割是将功能数据汇总为多个相关的汇总统计量,如平均值等 方法:本文对该方法进行了扩展,引入了混合数据BG (MD-BG)和RF (MD-RF)算法,对多个功能和非功能,或混合和混合数据,协变量,计算每个协变量的变量重要性图(VIP)。结果:MD-BG和MD-RF的主要区别在于协变量的选择,在前者中,所有协变量都保留在模型中,而后者使用随机样本的协变量。MD-RF有助于揭示功能协变量的最重要部分和最重要的非功能协变量。 结论:我们将我们的方法应用于DTI和Tecator两个数据集,并与GitHub中开发的R包(“RSRF”)比较了它们在连续和分类响应方面的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Random-Splitting Random Forest with Multiple Mixed-Data Covariates
Introduction:The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on the classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc. Methods: This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate. Results: The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates. Conclusion: We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with developed R package (“RSRF”) in the GitHub.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.80
自引率
0.00%
发文量
26
审稿时长
12 weeks
期刊最新文献
Analysis of Copula Frailty defective models in presence of Cure Fraction The Pattern of Motorcyclists' Death Due to Accidents and a Three-year Forecast in East Azerbaijan Province, Iran: A Time Series Study Factors Affecting Loneliness in Older Adults: Evidence from Ardakan Cohort Study on Aging (ACSA) Understanding Knowledge and Behaviors Related To the Covid-19 Epidemic in Medical Students in Morocco Survival Prognostic Factors of Male Breast Cancer Using Appropriate Survival Analysis for Small Sample Size: Three Center Experience
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1