采用不同修改的随机森林算法进行空值插值

Maad M. Mijwil, Alaa Wagih Abdulqader, Sura Mazin Ali, A. Sadiq
{"title":"采用不同修改的随机森林算法进行空值插值","authors":"Maad M. Mijwil, Alaa Wagih Abdulqader, Sura Mazin Ali, A. Sadiq","doi":"10.11591/ijai.v12.i1.pp374-383","DOIUrl":null,"url":null,"abstract":"Today, the world lives in the era of information and data. Therefore, it has become vital to collect and keep them in a database to perform a set of processes and obtain essential details. The null value problem will appear through these processes, which significantly influences the behaviour of processes such as analysis and prediction and gives inaccurate outcomes. In this concern, the authors decide to utilise the random forest technique by modifying it to calculate the null values from datasets got from the University of California Irvine (UCL) machine learning repository. The database of this scenario consists of connectionist bench, phishing websites, breast cancer, ionosphere, and COVID-19. The modified random forest algorithm is based on three matters and three number of null values. The samples chosen are founded on the proposed less redundancy bootstrap. Each tree has distinctive features depending on hybrid features selection. The final effect is considered based on ranked voting for classification. This scenario found that the modified random forest algorithm executed more suitable accuracy results than the traditional algorithm as it relied on four parameters and got sufficient accuracy in imputing the null value, which is grown by 9.5%, 6.5%, and 5.25% of one, two and three null values in the same row of datasets, respectively.","PeriodicalId":52221,"journal":{"name":"IAES International Journal of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Null-values imputation using different modification random forest algorithm\",\"authors\":\"Maad M. Mijwil, Alaa Wagih Abdulqader, Sura Mazin Ali, A. Sadiq\",\"doi\":\"10.11591/ijai.v12.i1.pp374-383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today, the world lives in the era of information and data. Therefore, it has become vital to collect and keep them in a database to perform a set of processes and obtain essential details. The null value problem will appear through these processes, which significantly influences the behaviour of processes such as analysis and prediction and gives inaccurate outcomes. In this concern, the authors decide to utilise the random forest technique by modifying it to calculate the null values from datasets got from the University of California Irvine (UCL) machine learning repository. The database of this scenario consists of connectionist bench, phishing websites, breast cancer, ionosphere, and COVID-19. The modified random forest algorithm is based on three matters and three number of null values. The samples chosen are founded on the proposed less redundancy bootstrap. Each tree has distinctive features depending on hybrid features selection. The final effect is considered based on ranked voting for classification. This scenario found that the modified random forest algorithm executed more suitable accuracy results than the traditional algorithm as it relied on four parameters and got sufficient accuracy in imputing the null value, which is grown by 9.5%, 6.5%, and 5.25% of one, two and three null values in the same row of datasets, respectively.\",\"PeriodicalId\":52221,\"journal\":{\"name\":\"IAES International Journal of Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IAES International Journal of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11591/ijai.v12.i1.pp374-383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v12.i1.pp374-383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 1

摘要

当今世界生活在信息和数据的时代。因此,收集它们并将其保存在数据库中以执行一组进程并获得必要的详细信息变得至关重要。在这些过程中会出现空值问题,它会严重影响分析和预测等过程的行为,并给出不准确的结果。在这个问题上,作者决定利用随机森林技术,通过修改它来计算来自加州大学欧文分校(UCL)机器学习存储库的数据集的空值。该场景的数据库包括连接主义工作台、网络钓鱼网站、乳腺癌、电离层和COVID-19。改进后的随机森林算法是基于三件事和三个数的空值。所选择的样本是建立在所提出的少冗余引导上的。每棵树都有不同的特征,这取决于混合特征选择。最终的效果是基于分级投票来考虑的。该场景发现,改进后的随机森林算法比传统算法执行更合适的准确率结果,因为它依赖于四个参数,并且在输入null值时获得了足够的准确率,在同一行数据集中分别增加了一个、两个和三个null值的9.5%、6.5%和5.25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Null-values imputation using different modification random forest algorithm
Today, the world lives in the era of information and data. Therefore, it has become vital to collect and keep them in a database to perform a set of processes and obtain essential details. The null value problem will appear through these processes, which significantly influences the behaviour of processes such as analysis and prediction and gives inaccurate outcomes. In this concern, the authors decide to utilise the random forest technique by modifying it to calculate the null values from datasets got from the University of California Irvine (UCL) machine learning repository. The database of this scenario consists of connectionist bench, phishing websites, breast cancer, ionosphere, and COVID-19. The modified random forest algorithm is based on three matters and three number of null values. The samples chosen are founded on the proposed less redundancy bootstrap. Each tree has distinctive features depending on hybrid features selection. The final effect is considered based on ranked voting for classification. This scenario found that the modified random forest algorithm executed more suitable accuracy results than the traditional algorithm as it relied on four parameters and got sufficient accuracy in imputing the null value, which is grown by 9.5%, 6.5%, and 5.25% of one, two and three null values in the same row of datasets, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IAES International Journal of Artificial Intelligence
IAES International Journal of Artificial Intelligence Decision Sciences-Information Systems and Management
CiteScore
3.90
自引率
0.00%
发文量
170
期刊最新文献
Traffic light counter detection comparison using you only look oncev3 and you only look oncev5 for version 3 and 5 Eligibility of village fund direct cash assistance recipients using artificial neural network Reducing the time needed to solve a traveling salesman problem by clustering with a Hierarchy-based algorithm Glove based wearable devices for sign language-GloSign Hybrid travel time estimation model for public transit buses using limited datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1