{"title":"基于深度学习的决策树集合,适用于不完整的医学数据集。","authors":"Chien-Hung Chiu, Shih-Wen Ke, Chih-Fong Tsai, Wei-Chao Lin, Min-Wei Huang, Yi-Hsiu Ko","doi":"10.3233/THC-220514","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose.</p><p><strong>Objective: </strong>To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles.</p><p><strong>Method: </strong>Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison.</p><p><strong>Results: </strong>The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method.</p><p><strong>Conclusion: </strong>The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.</p>","PeriodicalId":48978,"journal":{"name":"Technology and Health Care","volume":" ","pages":"75-87"},"PeriodicalIF":1.4000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning based decision tree ensembles for incomplete medical datasets.\",\"authors\":\"Chien-Hung Chiu, Shih-Wen Ke, Chih-Fong Tsai, Wei-Chao Lin, Min-Wei Huang, Yi-Hsiu Ko\",\"doi\":\"10.3233/THC-220514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose.</p><p><strong>Objective: </strong>To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles.</p><p><strong>Method: </strong>Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison.</p><p><strong>Results: </strong>The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method.</p><p><strong>Conclusion: </strong>The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.</p>\",\"PeriodicalId\":48978,\"journal\":{\"name\":\"Technology and Health Care\",\"volume\":\" \",\"pages\":\"75-87\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Technology and Health Care\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3233/THC-220514\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology and Health Care","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3233/THC-220514","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Deep learning based decision tree ensembles for incomplete medical datasets.
Background: In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose.
Objective: To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles.
Method: Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison.
Results: The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method.
Conclusion: The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.
期刊介绍:
Technology and Health Care is intended to serve as a forum for the presentation of original articles and technical notes, observing rigorous scientific standards. Furthermore, upon invitation, reviews, tutorials, discussion papers and minisymposia are featured. The main focus of THC is related to the overlapping areas of engineering and medicine. The following types of contributions are considered:
1.Original articles: New concepts, procedures and devices associated with the use of technology in medical research and clinical practice are presented to a readership with a widespread background in engineering and/or medicine. In particular, the clinical benefit deriving from the application of engineering methods and devices in clinical medicine should be demonstrated. Typically, full length original contributions have a length of 4000 words, thereby taking duly into account figures and tables.
2.Technical Notes and Short Communications: Technical Notes relate to novel technical developments with relevance for clinical medicine. In Short Communications, clinical applications are shortly described. 3.Both Technical Notes and Short Communications typically have a length of 1500 words.
Reviews and Tutorials (upon invitation only): Tutorial and educational articles for persons with a primarily medical background on principles of engineering with particular significance for biomedical applications and vice versa are presented. The Editorial Board is responsible for the selection of topics.
4.Minisymposia (upon invitation only): Under the leadership of a Special Editor, controversial or important issues relating to health care are highlighted and discussed by various authors.
5.Letters to the Editors: Discussions or short statements (not indexed).