关于分子靶向药物基于模型的剂量测定方法所需样本量的注记

S. Hong, Ying Sun, H. Li, Lynn Hs
{"title":"关于分子靶向药物基于模型的剂量测定方法所需样本量的注记","authors":"S. Hong, Ying Sun, H. Li, Lynn Hs","doi":"10.26420/AUSTINBIOMANDBIOSTAT.2021.1037","DOIUrl":null,"url":null,"abstract":"Random forest has proven to be a successful machine learning method, but it also can be time-consuming for handling large datasets, especially for doing iterative tasks. Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but such methods can be more time-consuming than standard imputation methods. To overcome this drawback, different parallel computing strategies have been proposed but their impact on imputation results and subsequent statistical analyses are relatively unknown. Newly proposed random forest implementations, such as ranger and randomForestSRC, have provided alternatives for easier parallelization, but their validity for doing iterative imputation are still unclear. Using random-forest imputation algorithm missForest as an example, this study examines two parallelized methods using newly proposed random forest implementations in comparison with the two parallel strategies (variable-wise distributed computation and model-wise distributed computation) using language-level parallelization from the software package. Results from the simulation experiments showed that the parallel strategies could influence both the imputation process and the final imputation results differently. Different parallel strategies can improve computational speed to a variable extent, and based on simulations, ranger can provide performance boost for datasets of different sizes with reasonable accuracy. Specifically, even though different strategies can produce similar normalized root mean squared prediction errors, the variable-wise distributed strategy led to additional biases when estimating the mean and inter-correlation of the covariates and their regression coefficients. And parallelization by randomForestSRC can lead to changes in both prediction errors and estimates.","PeriodicalId":91208,"journal":{"name":"Austin biometrics and biostatistics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Note on the Required Sample Size of Model-Based Dose-Finding Methods for Molecularly Targeted Agents\",\"authors\":\"S. Hong, Ying Sun, H. Li, Lynn Hs\",\"doi\":\"10.26420/AUSTINBIOMANDBIOSTAT.2021.1037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Random forest has proven to be a successful machine learning method, but it also can be time-consuming for handling large datasets, especially for doing iterative tasks. Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but such methods can be more time-consuming than standard imputation methods. To overcome this drawback, different parallel computing strategies have been proposed but their impact on imputation results and subsequent statistical analyses are relatively unknown. Newly proposed random forest implementations, such as ranger and randomForestSRC, have provided alternatives for easier parallelization, but their validity for doing iterative imputation are still unclear. Using random-forest imputation algorithm missForest as an example, this study examines two parallelized methods using newly proposed random forest implementations in comparison with the two parallel strategies (variable-wise distributed computation and model-wise distributed computation) using language-level parallelization from the software package. Results from the simulation experiments showed that the parallel strategies could influence both the imputation process and the final imputation results differently. Different parallel strategies can improve computational speed to a variable extent, and based on simulations, ranger can provide performance boost for datasets of different sizes with reasonable accuracy. Specifically, even though different strategies can produce similar normalized root mean squared prediction errors, the variable-wise distributed strategy led to additional biases when estimating the mean and inter-correlation of the covariates and their regression coefficients. And parallelization by randomForestSRC can lead to changes in both prediction errors and estimates.\",\"PeriodicalId\":91208,\"journal\":{\"name\":\"Austin biometrics and biostatistics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Austin biometrics and biostatistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26420/AUSTINBIOMANDBIOSTAT.2021.1037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Austin biometrics and biostatistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26420/AUSTINBIOMANDBIOSTAT.2021.1037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

随机森林已被证明是一种成功的机器学习方法,但它在处理大型数据集时也可能很耗时,尤其是在执行迭代任务时。机器学习迭代插补方法已被研究人员广泛接受,用于插补缺失数据,但这种方法可能比标准插补方法更耗时。为了克服这一缺点,已经提出了不同的并行计算策略,但它们对插补结果和随后的统计分析的影响相对未知。新提出的随机森林实现,如ranger和randomForestSRC,为更容易的并行化提供了替代方案,但它们在进行迭代插补方面的有效性仍不清楚。以随机森林插补算法missForest为例,本研究考察了两种使用新提出的随机森林实现的并行化方法,并与软件包中使用语言级并行化的两种并行策略(变量分布式计算和模型分布式计算)进行了比较。模拟实验结果表明,并行策略对插补过程和最终插补结果的影响不同。不同的并行策略可以在不同程度上提高计算速度,并且基于仿真,ranger可以以合理的精度为不同大小的数据集提供性能提升。具体而言,即使不同的策略可以产生相似的归一化均方根预测误差,但在估计协变量及其回归系数的平均值和互相关时,按变量分布的策略也会导致额外的偏差。randomForestSRC的并行化可以导致预测误差和估计值的变化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Note on the Required Sample Size of Model-Based Dose-Finding Methods for Molecularly Targeted Agents
Random forest has proven to be a successful machine learning method, but it also can be time-consuming for handling large datasets, especially for doing iterative tasks. Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but such methods can be more time-consuming than standard imputation methods. To overcome this drawback, different parallel computing strategies have been proposed but their impact on imputation results and subsequent statistical analyses are relatively unknown. Newly proposed random forest implementations, such as ranger and randomForestSRC, have provided alternatives for easier parallelization, but their validity for doing iterative imputation are still unclear. Using random-forest imputation algorithm missForest as an example, this study examines two parallelized methods using newly proposed random forest implementations in comparison with the two parallel strategies (variable-wise distributed computation and model-wise distributed computation) using language-level parallelization from the software package. Results from the simulation experiments showed that the parallel strategies could influence both the imputation process and the final imputation results differently. Different parallel strategies can improve computational speed to a variable extent, and based on simulations, ranger can provide performance boost for datasets of different sizes with reasonable accuracy. Specifically, even though different strategies can produce similar normalized root mean squared prediction errors, the variable-wise distributed strategy led to additional biases when estimating the mean and inter-correlation of the covariates and their regression coefficients. And parallelization by randomForestSRC can lead to changes in both prediction errors and estimates.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modeling the Number of COVID-19 Confirmed Cases and Deaths in Puerto Rico: One-year Experience A Note on the Required Sample Size of Model-Based Dose-Finding Methods for Molecularly Targeted Agents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1