基于SparkMlLib核心的交通事故分类识别分布式大数据分析模型

Imad El Mallahi, Jamal Riffi, Hamid Tairi, Abderrahamane Ez-Zahout, Mohamed Adnane Mahraz
{"title":"基于SparkMlLib核心的交通事故分类识别分布式大数据分析模型","authors":"Imad El Mallahi, Jamal Riffi, Hamid Tairi, Abderrahamane Ez-Zahout, Mohamed Adnane Mahraz","doi":"10.14313/jamris/4-2022/34","DOIUrl":null,"url":null,"abstract":"In this paper, our focus is on predicting the severity of traffic accidents, which represents a significant advancement in road accident management. Addressing this issue holds crucial implications for emergency logistical planning within urban areas. To assess accident severity within congested settings, we analyze the potential consequences of accidents, aiming to enhance the effectiveness of accident management protocols. In the context of this study, we introduce a real-time big data project. Our approach involves the implementation and comparison of various machine learning algorithms, namely Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN). The objective is to accurately classify and predict the severity of traffic accidents. Our methodology revolves around the real-time capture of incoming datasets, which are then stored within a Hadoop Distributed File System cluster. Subsequently, we leverage the core functionality of Spark MLlib, making use of pre-implemented Lambda functions. Throughout the project, classification and recognition tasks are conducted as part of the data locality processing paradigm. To validate our approach, we utilize a confusion matrix, which enables us to gauge the interclass impacts among Pedestrians, Vehicles or pillion passengers, and Drivers or riders. For empirical validation, we employ the TRAFFIC ACCIDENTS_2019_LEEDS dataset sourced from the Road Safety Department of Transport. This dataset facilitates the classification of severity predictions into three distinct categories: Pedestrian, vehicle or pillion passenger, and driver or rider. Notably, our experiments reveal impressive results. The Random Forest algorithm achieves an accuracy rate of 93%, outperforming SVM at 82% and ANN at 87%. Furthermore, in terms of precision-recall metrics, Random Forest also excels with a score of 93.82%, compared to SVM's 82.22% and ANN's 87.88%.","PeriodicalId":37910,"journal":{"name":"Journal of Automation, Mobile Robotics and Intelligent Systems","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Distributed Big Data Analytics Models for Traffic Accidents Classification and Recognition based SparkMlLib Cores\",\"authors\":\"Imad El Mallahi, Jamal Riffi, Hamid Tairi, Abderrahamane Ez-Zahout, Mohamed Adnane Mahraz\",\"doi\":\"10.14313/jamris/4-2022/34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, our focus is on predicting the severity of traffic accidents, which represents a significant advancement in road accident management. Addressing this issue holds crucial implications for emergency logistical planning within urban areas. To assess accident severity within congested settings, we analyze the potential consequences of accidents, aiming to enhance the effectiveness of accident management protocols. In the context of this study, we introduce a real-time big data project. Our approach involves the implementation and comparison of various machine learning algorithms, namely Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN). The objective is to accurately classify and predict the severity of traffic accidents. Our methodology revolves around the real-time capture of incoming datasets, which are then stored within a Hadoop Distributed File System cluster. Subsequently, we leverage the core functionality of Spark MLlib, making use of pre-implemented Lambda functions. Throughout the project, classification and recognition tasks are conducted as part of the data locality processing paradigm. To validate our approach, we utilize a confusion matrix, which enables us to gauge the interclass impacts among Pedestrians, Vehicles or pillion passengers, and Drivers or riders. For empirical validation, we employ the TRAFFIC ACCIDENTS_2019_LEEDS dataset sourced from the Road Safety Department of Transport. This dataset facilitates the classification of severity predictions into three distinct categories: Pedestrian, vehicle or pillion passenger, and driver or rider. Notably, our experiments reveal impressive results. The Random Forest algorithm achieves an accuracy rate of 93%, outperforming SVM at 82% and ANN at 87%. Furthermore, in terms of precision-recall metrics, Random Forest also excels with a score of 93.82%, compared to SVM's 82.22% and ANN's 87.88%.\",\"PeriodicalId\":37910,\"journal\":{\"name\":\"Journal of Automation, Mobile Robotics and Intelligent Systems\",\"volume\":\"122 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Automation, Mobile Robotics and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14313/jamris/4-2022/34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Automation, Mobile Robotics and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14313/jamris/4-2022/34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们的重点是预测交通事故的严重程度,这代表了道路事故管理的重大进步。解决这一问题对城市地区的应急后勤规划具有至关重要的意义。为了评估拥堵环境下的事故严重程度,我们分析了事故的潜在后果,旨在提高事故管理协议的有效性。在本研究的背景下,我们引入了一个实时大数据项目。我们的方法涉及各种机器学习算法的实现和比较,即随机森林,支持向量机(SVM)和人工神经网络(ANN)。目的是准确分类和预测交通事故的严重程度。我们的方法围绕实时捕获传入的数据集,然后将其存储在Hadoop分布式文件系统集群中。随后,我们利用Spark MLlib的核心功能,利用预实现的Lambda函数。在整个项目中,分类和识别任务是作为数据局部性处理范例的一部分进行的。为了验证我们的方法,我们使用了一个混淆矩阵,它使我们能够衡量行人、车辆或乘客、司机或乘客之间的跨阶层影响。为了进行实证验证,我们使用了来自交通运输道路安全部门的TRAFFIC accidents s_2019_leeds数据集。该数据集有助于将严重程度预测分为三种不同的类别:行人,车辆或乘客,驾驶员或乘客。值得注意的是,我们的实验揭示了令人印象深刻的结果。随机森林算法的准确率达到93%,优于支持向量机的82%和人工神经网络的87%。此外,在准确率-召回率指标方面,Random Forest的得分为93.82%,而SVM的得分为82.22%,ANN的得分为87.88%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Distributed Big Data Analytics Models for Traffic Accidents Classification and Recognition based SparkMlLib Cores
In this paper, our focus is on predicting the severity of traffic accidents, which represents a significant advancement in road accident management. Addressing this issue holds crucial implications for emergency logistical planning within urban areas. To assess accident severity within congested settings, we analyze the potential consequences of accidents, aiming to enhance the effectiveness of accident management protocols. In the context of this study, we introduce a real-time big data project. Our approach involves the implementation and comparison of various machine learning algorithms, namely Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN). The objective is to accurately classify and predict the severity of traffic accidents. Our methodology revolves around the real-time capture of incoming datasets, which are then stored within a Hadoop Distributed File System cluster. Subsequently, we leverage the core functionality of Spark MLlib, making use of pre-implemented Lambda functions. Throughout the project, classification and recognition tasks are conducted as part of the data locality processing paradigm. To validate our approach, we utilize a confusion matrix, which enables us to gauge the interclass impacts among Pedestrians, Vehicles or pillion passengers, and Drivers or riders. For empirical validation, we employ the TRAFFIC ACCIDENTS_2019_LEEDS dataset sourced from the Road Safety Department of Transport. This dataset facilitates the classification of severity predictions into three distinct categories: Pedestrian, vehicle or pillion passenger, and driver or rider. Notably, our experiments reveal impressive results. The Random Forest algorithm achieves an accuracy rate of 93%, outperforming SVM at 82% and ANN at 87%. Furthermore, in terms of precision-recall metrics, Random Forest also excels with a score of 93.82%, compared to SVM's 82.22% and ANN's 87.88%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Automation, Mobile Robotics and Intelligent Systems
Journal of Automation, Mobile Robotics and Intelligent Systems Engineering-Control and Systems Engineering
CiteScore
1.10
自引率
0.00%
发文量
25
期刊介绍: Fundamentals of automation and robotics Applied automatics Mobile robots control Distributed systems Navigation Mechatronics systems in robotics Sensors and actuators Data transmission Biomechatronics Mobile computing
期刊最新文献
A Numerical Analysis Based Internet of Things (IOT) and Big Data Analytics to Minimize Energy Consumption in Smart Buildings Design of Small-Phase Time-Variant Low-pass Digital Fractional Differentiators and Integrators Comparative Analysis of CNN-Based Smart Pre-Trained Models for Object Detection on DOTA Research to Simulate the Ship’s Vibration Regeneration System using a 6-Degree Freedom Gough-Stewart Parallel Robot Effective Nonlinear Predictive and CTC-PID Control of Rigid Manipulators
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1