Application of machine learning models and SHAP to examine crashes involving young drivers in New Jersey

{"title":"Application of machine learning models and SHAP to examine crashes involving young drivers in New Jersey","authors":"","doi":"10.1016/j.ijtst.2023.04.005","DOIUrl":null,"url":null,"abstract":"<div><p>Motor vehicle crashes are the leading cause of the death of teenagers in the United States. Young drivers have shown a higher propensity to get involved in crashes due to using a cellphone while driving, breaking the speed limit, and reckless driving. This study analyzed motor vehicle crashes involving young drivers using New Jersey crash data. Specifically, four years of crash data (2016–2019) were gathered and analyzed. Different machine learning (ML) methods, such as Random Forest, Light GBM, Catboost, and XGBoost, were used to predict the injury severity. The performance of the models was evaluated using accuracy, precision, and recall scores. In addition, interpretable ML techniques like sensitivity analysis and Shapley values were conducted to assess the most influential factors' impacts on young driver-related crashes. The results revealed that XGBoost performed better than Random Forest, CatBoost, and LightGBM models in crash severity prediction. Results from the sensitivity analysis showed that multi-vehicle crashes, angular crashes, crashes at intersections, and dark-not-lit conditions had increased crash severity. A partial dependence plot of SHAP values revealed that speeding in clear weather had a higher likelihood of injury crashes, and multi-vehicle crashes at the intersection had more injury crashes. We expect that the results obtained from this study will help policymakers and practitioners take appropriate countermeasures to improve the safety of young drivers in New Jersey.</p></div>","PeriodicalId":52282,"journal":{"name":"International Journal of Transportation Science and Technology","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2046043023000345/pdfft?md5=00851751ca9a7d9ae4b65b6eb418a6fe&pid=1-s2.0-S2046043023000345-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Transportation Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2046043023000345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 0

Abstract

Motor vehicle crashes are the leading cause of the death of teenagers in the United States. Young drivers have shown a higher propensity to get involved in crashes due to using a cellphone while driving, breaking the speed limit, and reckless driving. This study analyzed motor vehicle crashes involving young drivers using New Jersey crash data. Specifically, four years of crash data (2016–2019) were gathered and analyzed. Different machine learning (ML) methods, such as Random Forest, Light GBM, Catboost, and XGBoost, were used to predict the injury severity. The performance of the models was evaluated using accuracy, precision, and recall scores. In addition, interpretable ML techniques like sensitivity analysis and Shapley values were conducted to assess the most influential factors' impacts on young driver-related crashes. The results revealed that XGBoost performed better than Random Forest, CatBoost, and LightGBM models in crash severity prediction. Results from the sensitivity analysis showed that multi-vehicle crashes, angular crashes, crashes at intersections, and dark-not-lit conditions had increased crash severity. A partial dependence plot of SHAP values revealed that speeding in clear weather had a higher likelihood of injury crashes, and multi-vehicle crashes at the intersection had more injury crashes. We expect that the results obtained from this study will help policymakers and practitioners take appropriate countermeasures to improve the safety of young drivers in New Jersey.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
机器学习模型和SHAP在新泽西州年轻司机撞车事故中的应用
车祸是造成美国青少年死亡的主要原因。年轻驾驶员在驾车时使用手机、违反限速规定和鲁莽驾驶等原因导致的车祸发生率较高。本研究利用新泽西州的撞车数据分析了涉及年轻司机的机动车撞车事故。具体来说,本研究收集并分析了四年(2016-2019 年)的车祸数据。研究采用了不同的机器学习(ML)方法,如随机森林(Random Forest)、Light GBM、Catboost 和 XGBoost,来预测伤害严重程度。使用准确度、精确度和召回分数对模型的性能进行了评估。此外,还采用了灵敏度分析和 Shapley 值等可解释的 ML 技术,以评估对年轻驾驶员相关碰撞事故影响最大的因素。结果显示,XGBoost 在碰撞严重性预测方面的表现优于随机森林、CatBoost 和 LightGBM 模型。敏感性分析的结果表明,多车碰撞、角度碰撞、交叉路口碰撞和黑暗无光条件下的碰撞严重程度会增加。SHAP 值的部分依存图显示,晴朗天气下超速行驶发生伤害事故的可能性更高,而交叉路口的多车碰撞事故则造成更多伤害事故。我们希望本研究的结果能够帮助决策者和从业人员采取适当的对策,以提高新泽西州年轻驾驶员的安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Transportation Science and Technology
International Journal of Transportation Science and Technology Engineering-Civil and Structural Engineering
CiteScore
7.20
自引率
0.00%
发文量
105
审稿时长
88 days
期刊最新文献
An exploration of the preferences and mode choice behavior between autonomous demand-responsive transit and traditional buses Connected vehicle enabled hierarchical anomaly behavior management system for city-level networks Operational measures to maintaining physical distancing at railway stations Investigating the dynamics of speed and acceleration at merging and diverging sections using UAV based trajectory data Evaluating the impacts of major transportation disruptions – San Francisco Bay Area case study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1