基于深度学习的决策树集合，适用于不完整的医学数据集。

IF 1.4 4区医学 Q4 ENGINEERING, BIOMEDICAL Technology and Health Care Pub Date : 2024-01-01 DOI:10.3233/THC-220514

Chien-Hung Chiu, Shih-Wen Ke, Chih-Fong Tsai, Wei-Chao Lin, Min-Wei Huang, Yi-Hsiu Ko

{"title":"基于深度学习的决策树集合，适用于不完整的医学数据集。","authors":"Chien-Hung Chiu, Shih-Wen Ke, Chih-Fong Tsai, Wei-Chao Lin, Min-Wei Huang, Yi-Hsiu Ko","doi":"10.3233/THC-220514","DOIUrl":null,"url":null,"abstract":"Background: In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose.Objective: To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles.Method: Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison.Results: The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method.Conclusion: The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.","PeriodicalId":48978,"journal":{"name":"Technology and Health Care","volume":" ","pages":"75-87"},"PeriodicalIF":1.4000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning based decision tree ensembles for incomplete medical datasets.\",\"authors\":\"Chien-Hung Chiu, Shih-Wen Ke, Chih-Fong Tsai, Wei-Chao Lin, Min-Wei Huang, Yi-Hsiu Ko\",\"doi\":\"10.3233/THC-220514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose.Objective: To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles.Method: Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison.Results: The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method.Conclusion: The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.\",\"PeriodicalId\":48978,\"journal\":{\"name\":\"Technology and Health Care\",\"volume\":\" \",\"pages\":\"75-87\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Technology and Health Care\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3233/THC-220514\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology and Health Care","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3233/THC-220514","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

背景：在实践中，用于数据分析的数据集通常是不完整的，因为有些数据包含缺失的属性值。许多相关工作都侧重于构建特定的模型来生成估计值，以替换缺失值，从而使原来不完整的数据集变得完整。另一种解决方案是直接处理不完整数据集，而不进行缺失值估算，决策树是这方面的主要技术：介绍一种新方法，即基于深度学习的决策树集合（DLDTE），它借鉴了深度学习技术中使用的边界框和滑动窗口策略，将不完整数据集划分为若干子集，并通过决策树对每个子集进行学习，从而形成决策树集合：方法：使用两个包含几百个特征维度的医疗领域问题数据集进行性能比较，数据集的缺失率为 10%-50%：结果：与基线决策树方法、两种缺失值估算方法（均值和 k-近邻）以及病例删除方法相比，所提出的 DLDTE 的分类准确率最高：结果表明，DLDTE 能有效处理具有不同缺失率的不完整医疗数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deep learning based decision tree ensembles for incomplete medical datasets.

Background: In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose.

Objective: To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles.

Method: Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison.

Results: The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method.

Conclusion: The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Technology and Health Care HEALTH CARE SCIENCES & SERVICES-ENGINEERING, BIOMEDICAL

CiteScore

2.10

自引率

6.20%

发文量

282

审稿时长

>12 weeks

期刊介绍： Technology and Health Care is intended to serve as a forum for the presentation of original articles and technical notes, observing rigorous scientific standards. Furthermore, upon invitation, reviews, tutorials, discussion papers and minisymposia are featured. The main focus of THC is related to the overlapping areas of engineering and medicine. The following types of contributions are considered: 1.Original articles: New concepts, procedures and devices associated with the use of technology in medical research and clinical practice are presented to a readership with a widespread background in engineering and/or medicine. In particular, the clinical benefit deriving from the application of engineering methods and devices in clinical medicine should be demonstrated. Typically, full length original contributions have a length of 4000 words, thereby taking duly into account figures and tables. 2.Technical Notes and Short Communications: Technical Notes relate to novel technical developments with relevance for clinical medicine. In Short Communications, clinical applications are shortly described. 3.Both Technical Notes and Short Communications typically have a length of 1500 words. Reviews and Tutorials (upon invitation only): Tutorial and educational articles for persons with a primarily medical background on principles of engineering with particular significance for biomedical applications and vice versa are presented. The Editorial Board is responsible for the selection of topics. 4.Minisymposia (upon invitation only): Under the leadership of a Special Editor, controversial or important issues relating to health care are highlighted and discussed by various authors. 5.Letters to the Editors: Discussions or short statements (not indexed).