Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident types.

IF 2.3 4区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH International Journal of Injury Control and Safety Promotion Pub Date : 2024-06-01 Epub Date: 2024-01-02 DOI:10.1080/17457300.2023.2300424
Joon Woo Yoo, Junsung Park, Heejun Park
{"title":"Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident types.","authors":"Joon Woo Yoo, Junsung Park, Heejun Park","doi":"10.1080/17457300.2023.2300424","DOIUrl":null,"url":null,"abstract":"<p><p>Construction workers face a high risk of various occupational accidents, many of which can result in fatalities. This study aims to develop a prediction model for nine prevalent types of construction accidents, utilizing construction tasks, activities, and tools/materials as input features, through the application of machine learning-based multi-class classification algorithms. 152,867 construction accident summary reports, composed of both structured (construction task, construction activity, accident type) and unstructured data (tools/materials) were used for the study. The study employed several data processing techniques, including keyword extraction through text mining, Boruta feature selection, and SMOTE data resampling enhance model accuracy. Three performance metrics (Multi-class area under the receiver operating characteristic curve (MAUC), Multi-class Matthews Correlation Coefficient (MMCC), Geometric-mean (G-mean)) were used to compare the predictive performance of four machine learning algorithms, including Decision tree, Random forest, Naïve bayes, and XGBoost. Of the four algorithms, XGBoost showed the highest performance in predicting accident type (MAUC: 0.8603, MMCC: 0.3523, G-mean: 0.5009). Furthermore, a Shapley additive explanation (SHAP) analysis was conducted to visualize feature importance. The findings of this study make a valuable contribution to improving construction safety by presenting a prediction model for accident types derived from real-world big data.</p>","PeriodicalId":47014,"journal":{"name":"International Journal of Injury Control and Safety Promotion","volume":" ","pages":"203-215"},"PeriodicalIF":2.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Injury Control and Safety Promotion","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/17457300.2023.2300424","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Construction workers face a high risk of various occupational accidents, many of which can result in fatalities. This study aims to develop a prediction model for nine prevalent types of construction accidents, utilizing construction tasks, activities, and tools/materials as input features, through the application of machine learning-based multi-class classification algorithms. 152,867 construction accident summary reports, composed of both structured (construction task, construction activity, accident type) and unstructured data (tools/materials) were used for the study. The study employed several data processing techniques, including keyword extraction through text mining, Boruta feature selection, and SMOTE data resampling enhance model accuracy. Three performance metrics (Multi-class area under the receiver operating characteristic curve (MAUC), Multi-class Matthews Correlation Coefficient (MMCC), Geometric-mean (G-mean)) were used to compare the predictive performance of four machine learning algorithms, including Decision tree, Random forest, Naïve bayes, and XGBoost. Of the four algorithms, XGBoost showed the highest performance in predicting accident type (MAUC: 0.8603, MMCC: 0.3523, G-mean: 0.5009). Furthermore, a Shapley additive explanation (SHAP) analysis was conducted to visualize feature importance. The findings of this study make a valuable contribution to improving construction safety by presenting a prediction model for accident types derived from real-world big data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
加强韩国建筑工人的安全:预测事故类型的综合文本挖掘和机器学习框架。
建筑工人面临各种职业事故的高风险,其中许多事故可能导致死亡。本研究旨在通过应用基于机器学习的多类分类算法,利用施工任务、活动和工具/材料作为输入特征,为九种常见类型的建筑事故开发一个预测模型。研究使用了 152 867 份建筑事故总结报告,其中包括结构化数据(建筑任务、建筑活动、事故类型)和非结构化数据(工具/材料)。研究采用了多种数据处理技术,包括通过文本挖掘提取关键词、Boruta 特征选择和 SMOTE 数据重采样,以提高模型的准确性。研究使用了三个性能指标(多类接收者工作特征曲线下面积(MAUC)、多类马太相关系数(MMCC)、几何平均值(G-mean))来比较四种机器学习算法的预测性能,包括决策树、随机森林、奈夫贝叶斯和 XGBoost。在四种算法中,XGBoost 预测事故类型的性能最高(MAUC:0.8603,MMCC:0.3523,G-mean:0.5009)。此外,还进行了夏普利加法解释(SHAP)分析,以直观显示特征的重要性。本研究的结果通过提出一个从真实世界大数据中得出的事故类型预测模型,为改善建筑安全做出了宝贵贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Injury Control and Safety Promotion
International Journal of Injury Control and Safety Promotion PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-
CiteScore
4.40
自引率
13.00%
发文量
48
期刊介绍: International Journal of Injury Control and Safety Promotion (formerly Injury Control and Safety Promotion) publishes articles concerning all phases of injury control, including prevention, acute care and rehabilitation. Specifically, this journal will publish articles that for each type of injury: •describe the problem •analyse the causes and risk factors •discuss the design and evaluation of solutions •describe the implementation of effective programs and policies The journal encompasses all causes of fatal and non-fatal injury, including injuries related to: •transport •school and work •home and leisure activities •sport •violence and assault
期刊最新文献
Association between marine corps safety management system assessment results and injury rates and outcomes. A systematic literature review on occupational accident factors in the rail construction industry: lessons learned from a quarter-century of studies globally. An adapted taxonomy and framework for monitoring road safety strategies: a case study of Morocco. Co-location analysis of pedestrian accident attributes for Ankara. Mapping the relationship between traffic accidents, road network configuration, and urban land use.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1