Youness Manzali, Mohamed El far, M. Chahhou, Mohammed Elmohajir
{"title":"Enhancing Weak Nodes in Decision Tree Algorithm Using Data Augmentation","authors":"Youness Manzali, Mohamed El far, M. Chahhou, Mohammed Elmohajir","doi":"10.2478/cait-2022-0016","DOIUrl":null,"url":null,"abstract":"Abstract Decision trees are among the most popular classifiers in machine learning, artificial intelligence, and pattern recognition because they are accurate and easy to interpret. During the tree construction, a node containing too few observations (weak node) could still get split, and then the resulted split is unreliable and statistically has no value. Many existing machine-learning methods can resolve this issue, such as pruning, which removes the tree’s non-meaningful parts. This paper deals with the weak nodes differently; we introduce a new algorithm Enhancing Weak Nodes in Decision Tree (EWNDT), which reinforces them by increasing their data from other similar tree nodes. We called the data augmentation a virtual merging because we temporarily recalculate the best splitting attribute and the best threshold in the weak node. We have used two approaches to defining the similarity between two nodes. The experimental results are verified using benchmark datasets from the UCI machine-learning repository. The results indicate that the EWNDT algorithm gives a good performance.","PeriodicalId":45562,"journal":{"name":"Cybernetics and Information Technologies","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cybernetics and Information Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/cait-2022-0016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract Decision trees are among the most popular classifiers in machine learning, artificial intelligence, and pattern recognition because they are accurate and easy to interpret. During the tree construction, a node containing too few observations (weak node) could still get split, and then the resulted split is unreliable and statistically has no value. Many existing machine-learning methods can resolve this issue, such as pruning, which removes the tree’s non-meaningful parts. This paper deals with the weak nodes differently; we introduce a new algorithm Enhancing Weak Nodes in Decision Tree (EWNDT), which reinforces them by increasing their data from other similar tree nodes. We called the data augmentation a virtual merging because we temporarily recalculate the best splitting attribute and the best threshold in the weak node. We have used two approaches to defining the similarity between two nodes. The experimental results are verified using benchmark datasets from the UCI machine-learning repository. The results indicate that the EWNDT algorithm gives a good performance.