Boon Chong Choo, Musab Abdul Razak, Mohd Zahirasri Mohd Tohir, D. A. Awang Biak, Syafiie Syam
{"title":"利用实际事故和相关数据的时间序列,基于 ARIMA 的马来西亚吉隆坡事故预测模型","authors":"Boon Chong Choo, Musab Abdul Razak, Mohd Zahirasri Mohd Tohir, D. A. Awang Biak, Syafiie Syam","doi":"10.47836/pjst.32.3.07","DOIUrl":null,"url":null,"abstract":"Recently, there has been an emerging trend to analyse time series data and utilise sophisticated tools for optimally fitting time series models. To date, Malaysian industrial accident data is underutilised and lacks informative records. Thus, this paper aims to investigate the Malaysian accident database and further evaluate the optimal forecasting models in accident prediction. The model’s input was based on available data from the Department of Occupational Safety and Health, Malaysia (DOSH), from 2018 until 2021, with 80% of the dataset to train the models and the remaining 20% for validation. The negative binomial and Poisson distribution prediction showed a mean absolute percentage error (MAPE) of 33% and 51%, respectively. It indicated that the negative binomial performed better than the Poisson distribution in accident frequency prediction. The available time series accident data were gathered for four years, and stationarity was checked in R Studio software for the Augmented Dickey-Fuller test. The lowest Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and other error values were used to justify the best model, which was the ARIMA(2,0,2)(2,0,0)(12) model. The ARIMA models were considered after the data showed autocorrelation. The MAPE for both ARIMA in R and manual time series were 40% and 49%, respectively. Therefore, the accident prediction by using R Studio would outperform the manually negative binomial and Poisson distribution. Based on the findings, industrial safety practitioners should report accidents to DOSH truthfully in the era of digitalisation. It could enable future data-driven accident predictions to be carried out.","PeriodicalId":46234,"journal":{"name":"Pertanika Journal of Science and Technology","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data\",\"authors\":\"Boon Chong Choo, Musab Abdul Razak, Mohd Zahirasri Mohd Tohir, D. A. Awang Biak, Syafiie Syam\",\"doi\":\"10.47836/pjst.32.3.07\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, there has been an emerging trend to analyse time series data and utilise sophisticated tools for optimally fitting time series models. To date, Malaysian industrial accident data is underutilised and lacks informative records. Thus, this paper aims to investigate the Malaysian accident database and further evaluate the optimal forecasting models in accident prediction. The model’s input was based on available data from the Department of Occupational Safety and Health, Malaysia (DOSH), from 2018 until 2021, with 80% of the dataset to train the models and the remaining 20% for validation. The negative binomial and Poisson distribution prediction showed a mean absolute percentage error (MAPE) of 33% and 51%, respectively. It indicated that the negative binomial performed better than the Poisson distribution in accident frequency prediction. The available time series accident data were gathered for four years, and stationarity was checked in R Studio software for the Augmented Dickey-Fuller test. The lowest Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and other error values were used to justify the best model, which was the ARIMA(2,0,2)(2,0,0)(12) model. The ARIMA models were considered after the data showed autocorrelation. The MAPE for both ARIMA in R and manual time series were 40% and 49%, respectively. Therefore, the accident prediction by using R Studio would outperform the manually negative binomial and Poisson distribution. Based on the findings, industrial safety practitioners should report accidents to DOSH truthfully in the era of digitalisation. It could enable future data-driven accident predictions to be carried out.\",\"PeriodicalId\":46234,\"journal\":{\"name\":\"Pertanika Journal of Science and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pertanika Journal of Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47836/pjst.32.3.07\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pertanika Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47836/pjst.32.3.07","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
最近,分析时间序列数据和利用先进工具优化拟合时间序列模型已成为一种新兴趋势。迄今为止,马来西亚的工业事故数据尚未得到充分利用,而且缺乏翔实的记录。因此,本文旨在调查马来西亚事故数据库,并进一步评估事故预测中的最优预测模型。模型的输入基于马来西亚职业安全与健康部(DOSH)2018 年至 2021 年的可用数据,其中 80% 的数据集用于训练模型,其余 20% 用于验证。负二项分布和泊松分布预测的平均绝对百分比误差(MAPE)分别为 33% 和 51%。这表明负二项分布在事故频率预测方面的表现优于泊松分布。收集了四年的时间序列事故数据,并在 R Studio 软件中进行了增强 Dickey-Fuller 检验,以检查静态性。使用最低的阿凯克信息准则(AIC)、贝叶斯信息准则(BIC)和其他误差值来证明最佳模型,即 ARIMA(2,0,2)(2,0,0)(12) 模型。ARIMA 模型是在数据显示出自相关性后才被考虑的。R 中的 ARIMA 模型和人工时间序列的 MAPE 分别为 40% 和 49%。因此,使用 R Studio 进行事故预测将优于人工负二项分布和泊松分布。根据研究结果,在数字化时代,工业安全从业人员应向监督与健康部如实报告事故。这将有助于未来进行数据驱动的事故预测。
An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data
Recently, there has been an emerging trend to analyse time series data and utilise sophisticated tools for optimally fitting time series models. To date, Malaysian industrial accident data is underutilised and lacks informative records. Thus, this paper aims to investigate the Malaysian accident database and further evaluate the optimal forecasting models in accident prediction. The model’s input was based on available data from the Department of Occupational Safety and Health, Malaysia (DOSH), from 2018 until 2021, with 80% of the dataset to train the models and the remaining 20% for validation. The negative binomial and Poisson distribution prediction showed a mean absolute percentage error (MAPE) of 33% and 51%, respectively. It indicated that the negative binomial performed better than the Poisson distribution in accident frequency prediction. The available time series accident data were gathered for four years, and stationarity was checked in R Studio software for the Augmented Dickey-Fuller test. The lowest Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and other error values were used to justify the best model, which was the ARIMA(2,0,2)(2,0,0)(12) model. The ARIMA models were considered after the data showed autocorrelation. The MAPE for both ARIMA in R and manual time series were 40% and 49%, respectively. Therefore, the accident prediction by using R Studio would outperform the manually negative binomial and Poisson distribution. Based on the findings, industrial safety practitioners should report accidents to DOSH truthfully in the era of digitalisation. It could enable future data-driven accident predictions to be carried out.
期刊介绍:
Pertanika Journal of Science and Technology aims to provide a forum for high quality research related to science and engineering research. Areas relevant to the scope of the journal include: bioinformatics, bioscience, biotechnology and bio-molecular sciences, chemistry, computer science, ecology, engineering, engineering design, environmental control and management, mathematics and statistics, medicine and health sciences, nanotechnology, physics, safety and emergency management, and related fields of study.