{"title":"A closer look at the kernels generated by the decision and regression tree ensembles","authors":"Dai Feng, R. Baumgartner","doi":"10.1080/19466315.2022.2150680","DOIUrl":null,"url":null,"abstract":"Abstract Tree ensembles can be interpreted as implicit kernel generators, where the ensuing proximity matrix represents the data-driven tree ensemble kernel. Focus of our work is the utility of tree based ensembles as kernel generators that (in conjunction with a regularized linear model) enable kernel learning. We elucidate the performance of the tree based random forest (RF) and gradient boosted tree (GBT) kernels in a comprehensive simulation study comprising of continuous and binary targets. We show that for continuous targets (regression), this kernel learning approach is competitive to the respective tree ensemble in higher dimensional scenarios, particularly in cases with larger number of noisy features. For the binary target (classification), the tree ensemble based kernels and their respective ensembles exhibit comparable performance. We provide the results from several real life datasets for regression and classification relevant for biopharmaceutical and biomedical applications, that are in line with the simulations to show how these insights may be leveraged in practice. We discuss general applicability and extensions of the tree ensemble based kernels for survival targets and interpretable landmarking in classification and regression. Finally, we outline future research for kernel learning due to feature space partitionings.","PeriodicalId":51280,"journal":{"name":"Statistics in Biopharmaceutical Research","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Biopharmaceutical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/19466315.2022.2150680","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract Tree ensembles can be interpreted as implicit kernel generators, where the ensuing proximity matrix represents the data-driven tree ensemble kernel. Focus of our work is the utility of tree based ensembles as kernel generators that (in conjunction with a regularized linear model) enable kernel learning. We elucidate the performance of the tree based random forest (RF) and gradient boosted tree (GBT) kernels in a comprehensive simulation study comprising of continuous and binary targets. We show that for continuous targets (regression), this kernel learning approach is competitive to the respective tree ensemble in higher dimensional scenarios, particularly in cases with larger number of noisy features. For the binary target (classification), the tree ensemble based kernels and their respective ensembles exhibit comparable performance. We provide the results from several real life datasets for regression and classification relevant for biopharmaceutical and biomedical applications, that are in line with the simulations to show how these insights may be leveraged in practice. We discuss general applicability and extensions of the tree ensemble based kernels for survival targets and interpretable landmarking in classification and regression. Finally, we outline future research for kernel learning due to feature space partitionings.
期刊介绍:
Statistics in Biopharmaceutical Research ( SBR), publishes articles that focus on the needs of researchers and applied statisticians in biopharmaceutical industries; academic biostatisticians from schools of medicine, veterinary medicine, public health, and pharmacy; statisticians and quantitative analysts working in regulatory agencies (e.g., U.S. Food and Drug Administration and its counterpart in other countries); statisticians with an interest in adopting methodology presented in this journal to their own fields; and nonstatisticians with an interest in applying statistical methods to biopharmaceutical problems.
Statistics in Biopharmaceutical Research accepts papers that discuss appropriate statistical methodology and information regarding the use of statistics in all phases of research, development, and practice in the pharmaceutical, biopharmaceutical, device, and diagnostics industries. Articles should focus on the development of novel statistical methods, novel applications of current methods, or the innovative application of statistical principles that can be used by statistical practitioners in these disciplines. Areas of application may include statistical methods for drug discovery, including papers that address issues of multiplicity, sequential trials, adaptive designs, etc.; preclinical and clinical studies; genomics and proteomics; bioassay; biomarkers and surrogate markers; models and analyses of drug history, including pharmacoeconomics, product life cycle, detection of adverse events in clinical studies, and postmarketing risk assessment; regulatory guidelines, including issues of standardization of terminology (e.g., CDISC), tolerance and specification limits related to pharmaceutical practice, and novel methods of drug approval; and detection of adverse events in clinical and toxicological studies. Tutorial articles also are welcome. Articles should include demonstrable evidence of the usefulness of this methodology (presumably by means of an application).
The Editorial Board of SBR intends to ensure that the journal continually provides important, useful, and timely information. To accomplish this, the board strives to attract outstanding articles by seeing that each submission receives a careful, thorough, and prompt review.
Authors can choose to publish gold open access in this journal.