基于机器学习的乳腺癌Oncotype Dx复发风险辅助预测系统构建

X. Kong, Lin Zhang, Quanda Zhang, Jiun Choong, Sicong Ma, X. Qin, Z. Qi, Ran Cheng, Yi Fang, Z. Ge, Yu Jiang, Jing Wang
{"title":"基于机器学习的乳腺癌Oncotype Dx复发风险辅助预测系统构建","authors":"X. Kong, Lin Zhang, Quanda Zhang, Jiun Choong, Sicong Ma, X. Qin, Z. Qi, Ran Cheng, Yi Fang, Z. Ge, Yu Jiang, Jing Wang","doi":"10.2139/ssrn.3642585","DOIUrl":null,"url":null,"abstract":"Background: \nTAILORx data confirm that using a 21-gene expression assay known as Oncotype DX (ODX; Genomic Health, Redwood City, CA) to assess the risk of early-stage breast cancer recurrence can spare women unnecessary chemotherapy. However, high up-front costs (list price, $4175) could dissuade usage. Also, from a technical perspective, this test cannot be widely used in developing countries, especially in relatively poor areas. \n \nMethods: \nBy analyzing the Surveillance, Epidemiology, and End-Results (SEER) database, Logistic Regression models were firstly used to identified significant variables that might be associated with breast cancer patients’ ODX recurrence scores (RS) and risk levels. Secondly, by adopting a series of machine leaning (ML) technologies, including random forest (RF), gradient boosting decision tree (GBDT), and XGBoost, we developed an assistant forecast system for the ODX recurrence risks [low-to-intermediate-risk (RS=2~25) and high-risk (RS=26~100)] based on individual’s sociodemographic information and clinicopathological information. This developed system was then validated in an independent validation data set via a training-test split method on the original data set. \n \nFindings: \nWe identified 111,635 patients with breast cancer, among which, 86617 patients (77.59%) were not beyond 50 years old. There were 23,514 patients (21.1%) whose ODX RSs were within the low risk of recurrence group, 71,439 patients (64.0%) were at intermediate-risk level, and 16,682 patients (14.9%) were at high-risk level. Via the multinomial ordinal logit regression, the variables closely associated with the ODX recurrence scores included age, sex, race, tumor primary site, histopathological grade, tumor size, pathology, PR status, HER2 status, (all P<0.05). Through our developed assistant forecast system, as long as a breast cancer patient’s precise sociodemographic and clinicopathological information was input, the computer would be able to automatically forecast the patient’s ODX recurrence risk level with an accuracy probability. According to the validation results, the best overall accuracy of this forecast system was 87.02% (Ordered Logistic Regression), with 99.06% specificity (Ordered Logistic Regression), and 86.0% sensitivity (RF). \n \nInterpretation: \nOur developed assistant forecast system based on sociodemographic and clinicopathological data provided clinicians an alternative tool to estimate breast cancer patients’ ODX recurrence risk level, which could be used to help assist in making an adjuvant treatment decision. In the future, this tool is widely worthwhile to be retrospectively validated in clinical practice and applied in actual clinical scenarios.","PeriodicalId":8928,"journal":{"name":"Biomaterials eJournal","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Construction of an Assistant Forecast System for Breast Cancer Oncotype Dx Recurrence Risk by Machine Learning\",\"authors\":\"X. Kong, Lin Zhang, Quanda Zhang, Jiun Choong, Sicong Ma, X. Qin, Z. Qi, Ran Cheng, Yi Fang, Z. Ge, Yu Jiang, Jing Wang\",\"doi\":\"10.2139/ssrn.3642585\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: \\nTAILORx data confirm that using a 21-gene expression assay known as Oncotype DX (ODX; Genomic Health, Redwood City, CA) to assess the risk of early-stage breast cancer recurrence can spare women unnecessary chemotherapy. However, high up-front costs (list price, $4175) could dissuade usage. Also, from a technical perspective, this test cannot be widely used in developing countries, especially in relatively poor areas. \\n \\nMethods: \\nBy analyzing the Surveillance, Epidemiology, and End-Results (SEER) database, Logistic Regression models were firstly used to identified significant variables that might be associated with breast cancer patients’ ODX recurrence scores (RS) and risk levels. Secondly, by adopting a series of machine leaning (ML) technologies, including random forest (RF), gradient boosting decision tree (GBDT), and XGBoost, we developed an assistant forecast system for the ODX recurrence risks [low-to-intermediate-risk (RS=2~25) and high-risk (RS=26~100)] based on individual’s sociodemographic information and clinicopathological information. This developed system was then validated in an independent validation data set via a training-test split method on the original data set. \\n \\nFindings: \\nWe identified 111,635 patients with breast cancer, among which, 86617 patients (77.59%) were not beyond 50 years old. There were 23,514 patients (21.1%) whose ODX RSs were within the low risk of recurrence group, 71,439 patients (64.0%) were at intermediate-risk level, and 16,682 patients (14.9%) were at high-risk level. Via the multinomial ordinal logit regression, the variables closely associated with the ODX recurrence scores included age, sex, race, tumor primary site, histopathological grade, tumor size, pathology, PR status, HER2 status, (all P<0.05). Through our developed assistant forecast system, as long as a breast cancer patient’s precise sociodemographic and clinicopathological information was input, the computer would be able to automatically forecast the patient’s ODX recurrence risk level with an accuracy probability. According to the validation results, the best overall accuracy of this forecast system was 87.02% (Ordered Logistic Regression), with 99.06% specificity (Ordered Logistic Regression), and 86.0% sensitivity (RF). \\n \\nInterpretation: \\nOur developed assistant forecast system based on sociodemographic and clinicopathological data provided clinicians an alternative tool to estimate breast cancer patients’ ODX recurrence risk level, which could be used to help assist in making an adjuvant treatment decision. In the future, this tool is widely worthwhile to be retrospectively validated in clinical practice and applied in actual clinical scenarios.\",\"PeriodicalId\":8928,\"journal\":{\"name\":\"Biomaterials eJournal\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomaterials eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3642585\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomaterials eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3642585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

背景:TAILORx数据证实,使用21个基因表达试验Oncotype DX (ODX;基因组健康,红木城,CA)评估早期乳腺癌复发的风险可以避免妇女不必要的化疗。然而,高昂的前期成本(标价4175美元)可能会阻碍用户使用。此外,从技术角度来看,这种测试不能在发展中国家广泛使用,特别是在相对贫穷的地区。方法:通过对SEER (Surveillance, Epidemiology, and End-Results)数据库的分析,首先采用Logistic回归模型识别可能与乳腺癌患者ODX复发评分(RS)和风险水平相关的显著变量。其次,采用随机森林(random forest, RF)、梯度增强决策树(gradient boosting decision tree, GBDT)、XGBoost等一系列机器学习(ML)技术,基于个体的社会人口统计学信息和临床病理信息,开发了ODX复发风险[低至中危(RS=2~25)和高危(RS=26~100)]的辅助预测系统。然后,通过原始数据集上的训练-测试分割方法,在独立的验证数据集中对该开发系统进行了验证。结果:共发现111635例乳腺癌患者,其中年龄不超过50岁的患者86617例(77.59%)。ODX RSs低危复发组23514例(21.1%),中危复发组71439例(64.0%),高危复发组16682例(14.9%)。经多项有序logit回归分析,与ODX复发评分密切相关的变量包括年龄、性别、种族、肿瘤原发部位、组织病理分级、肿瘤大小、病理、PR状态、HER2状态等(均P<0.05)。通过我们开发的辅助预测系统,只要输入乳腺癌患者精确的社会人口学和临床病理信息,计算机就能自动预测患者的ODX复发风险水平,并具有准确的概率。验证结果表明,该预测系统的最佳总体准确率为87.02%(有序Logistic回归),特异度为99.06%(有序Logistic回归),敏感性为86.0% (RF)。解释:我们开发的基于社会人口学和临床病理数据的辅助预测系统为临床医生提供了一种评估乳腺癌患者ODX复发风险水平的替代工具,可用于帮助制定辅助治疗决策。在未来,该工具值得在临床实践中进行回顾性验证,并在实际临床场景中应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Construction of an Assistant Forecast System for Breast Cancer Oncotype Dx Recurrence Risk by Machine Learning
Background: TAILORx data confirm that using a 21-gene expression assay known as Oncotype DX (ODX; Genomic Health, Redwood City, CA) to assess the risk of early-stage breast cancer recurrence can spare women unnecessary chemotherapy. However, high up-front costs (list price, $4175) could dissuade usage. Also, from a technical perspective, this test cannot be widely used in developing countries, especially in relatively poor areas. Methods: By analyzing the Surveillance, Epidemiology, and End-Results (SEER) database, Logistic Regression models were firstly used to identified significant variables that might be associated with breast cancer patients’ ODX recurrence scores (RS) and risk levels. Secondly, by adopting a series of machine leaning (ML) technologies, including random forest (RF), gradient boosting decision tree (GBDT), and XGBoost, we developed an assistant forecast system for the ODX recurrence risks [low-to-intermediate-risk (RS=2~25) and high-risk (RS=26~100)] based on individual’s sociodemographic information and clinicopathological information. This developed system was then validated in an independent validation data set via a training-test split method on the original data set. Findings: We identified 111,635 patients with breast cancer, among which, 86617 patients (77.59%) were not beyond 50 years old. There were 23,514 patients (21.1%) whose ODX RSs were within the low risk of recurrence group, 71,439 patients (64.0%) were at intermediate-risk level, and 16,682 patients (14.9%) were at high-risk level. Via the multinomial ordinal logit regression, the variables closely associated with the ODX recurrence scores included age, sex, race, tumor primary site, histopathological grade, tumor size, pathology, PR status, HER2 status, (all P<0.05). Through our developed assistant forecast system, as long as a breast cancer patient’s precise sociodemographic and clinicopathological information was input, the computer would be able to automatically forecast the patient’s ODX recurrence risk level with an accuracy probability. According to the validation results, the best overall accuracy of this forecast system was 87.02% (Ordered Logistic Regression), with 99.06% specificity (Ordered Logistic Regression), and 86.0% sensitivity (RF). Interpretation: Our developed assistant forecast system based on sociodemographic and clinicopathological data provided clinicians an alternative tool to estimate breast cancer patients’ ODX recurrence risk level, which could be used to help assist in making an adjuvant treatment decision. In the future, this tool is widely worthwhile to be retrospectively validated in clinical practice and applied in actual clinical scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hemostatic Shape Memory Polymer Foams With Improved Survival in a Lethal Traumatic Hemorrhage Model Cochlear Implant-Based Electric-Acoustic Stimulation Modulates Neural Stem Cell-Derived Neural Regeneration Magnetic Mesoporous Embolic Microspheres in Transcatheter Arterial Chemoembolization for Liver Cancer Examining How Different Carbon Entry Point Affects Recombinant Protein Production from Ethylene Glycol in Bacillus Subtilis Printable Smart 3D Architectures of Regenerated Silk on Poly(3- Hydroxybutyrate-Co-3-Hydroxyvalerate)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1