Boost-R: Gradient boosted trees for recurrence data

IF 2.6 2区 工程技术 Q2 ENGINEERING, INDUSTRIAL Journal of Quality Technology Pub Date : 2021-07-03 DOI:10.1080/00224065.2021.1948373
Xiao Liu, Rong Pan
{"title":"Boost-R: Gradient boosted trees for recurrence data","authors":"Xiao Liu, Rong Pan","doi":"10.1080/00224065.2021.1948373","DOIUrl":null,"url":null,"abstract":"Abstract Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L 2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"11 1","pages":"545 - 565"},"PeriodicalIF":2.6000,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/00224065.2021.1948373","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 2

Abstract

Abstract Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L 2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Boost-R:用于递归数据的梯度增强树
重复数据来自多学科领域,包括可靠性、网络安全、医疗保健、在线零售等。本文研究了一种基于加性树的方法,称为Boost-R (Boosting for recurrent Data),用于具有静态和动态特征的循环事件数据。Boost-R构建了一个梯度增强的加性树集合来估计循环事件过程的累积强度函数,其中通过最小化观测到的和预测的累积强度之间的正则化l2距离,将新树添加到集合中。与传统的回归树不同,Boost-R在每个树叶上构建了一个时间相关的函数。这些函数的和,从多个树,产生累积强度的集合估计。当隐藏的子种群存在于异质种群中时,基于树的方法的分而治之的特性很有吸引力。回归树的非参数性质有助于避免对事件过程和特征之间复杂的相互作用进行参数假设。通过全面的数值实例研究了Boost-R的关键见解和优势。Boost-R的数据集和计算机代码可在GitHub上获得。据我们所知,Boost-R是第一个基于梯度增强加性树的方法,用于对具有静态和动态特征信息的大规模循环事件数据建模。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Quality Technology
Journal of Quality Technology 管理科学-工程:工业
CiteScore
5.20
自引率
4.00%
发文量
23
审稿时长
>12 weeks
期刊介绍: The objective of Journal of Quality Technology is to contribute to the technical advancement of the field of quality technology by publishing papers that emphasize the practical applicability of new techniques, instructive examples of the operation of existing techniques and results of historical researches. Expository, review, and tutorial papers are also acceptable if they are written in a style suitable for practicing engineers. Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days
期刊最新文献
SpTe2M: An R package for nonparametric modeling and monitoring of spatiotemporal data Spatial modeling and monitoring considering long-range dependence A change-point–based control chart for detecting sparse mean changes in high-dimensional heteroscedastic data Best practices for multi- and mixed-level supersaturated designs Sequential Latin hypercube design for two-layer computer simulators
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1