Combining BERT with numerical variables to classify injury leave based on accident description

IF 1.8 4区工程技术 Q3 ENGINEERING, INDUSTRIAL Proceedings of the Institution of Mechanical Engineers Part O-Journal of Risk and Reliability Pub Date : 2022-12-10 DOI:10.1177/1748006x221140194

Plínio MS Ramos, J. Macedo, Caio BS Maior, M. Moura, I. Lins

{"title":"Combining BERT with numerical variables to classify injury leave based on accident description","authors":"Plínio MS Ramos, J. Macedo, Caio BS Maior, M. Moura, I. Lins","doi":"10.1177/1748006x221140194","DOIUrl":null,"url":null,"abstract":"The occurrence of work accidents may threaten the workers’ health and lead to consequences for the organizations as well, such as restructuring of work and direct/indirect costs with the absence of the worker. In this context, accident investigation reports contain information that can support companies to propose preventive and mitigative measures and identify causes and consequences of injury events. However, this information is frequently complex, redundant, and/or incomplete. Additionally, a complete human review of the entire database is arduous, considering numerous reports produced by a company. Indeed, Natural Language Processing (NLP)-based techniques are suitable for analyzing a massive amount of textual information. In this paper, we adopted NLP techniques to determine whether an injury leave would be expected from a given accident report. The methodology was applied to accident reports collected from an actual hydroelectric power company using Bidirectional Encoder Representations from Transformers (BERT), a state-of-art NLP method. The text representations provided by BERT model were combined with numerical and binary variables extracted from the accident reports. These combined variables are input to a Multilayer Perceptron (MLP) that predicts the occurrence of the accident leave for a given accident. After cross-validation, the results showed a median accuracy of 73.5%. Additionally, we discuss several reports that presented high and low proportions of correct classifications by the models tested and discussed the possible reasons. Indeed, accident investigation reports provide useful knowledge to support decisions in the safety context.","PeriodicalId":51266,"journal":{"name":"Proceedings of the Institution of Mechanical Engineers Part O-Journal of Risk and Reliability","volume":"90 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Institution of Mechanical Engineers Part O-Journal of Risk and Reliability","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1177/1748006x221140194","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 1

Abstract

The occurrence of work accidents may threaten the workers’ health and lead to consequences for the organizations as well, such as restructuring of work and direct/indirect costs with the absence of the worker. In this context, accident investigation reports contain information that can support companies to propose preventive and mitigative measures and identify causes and consequences of injury events. However, this information is frequently complex, redundant, and/or incomplete. Additionally, a complete human review of the entire database is arduous, considering numerous reports produced by a company. Indeed, Natural Language Processing (NLP)-based techniques are suitable for analyzing a massive amount of textual information. In this paper, we adopted NLP techniques to determine whether an injury leave would be expected from a given accident report. The methodology was applied to accident reports collected from an actual hydroelectric power company using Bidirectional Encoder Representations from Transformers (BERT), a state-of-art NLP method. The text representations provided by BERT model were combined with numerical and binary variables extracted from the accident reports. These combined variables are input to a Multilayer Perceptron (MLP) that predicts the occurrence of the accident leave for a given accident. After cross-validation, the results showed a median accuracy of 73.5%. Additionally, we discuss several reports that presented high and low proportions of correct classifications by the models tested and discussed the possible reasons. Indeed, accident investigation reports provide useful knowledge to support decisions in the safety context.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

结合BERT和数值变量对基于事故描述的工伤假进行分类

工作事故的发生可能威胁到工人的健康，也会给组织带来后果，例如工作结构调整和工人缺勤造成的直接/间接成本。在这种情况下，事故调查报告包含的信息可以支持公司提出预防和缓解措施，并确定伤害事件的原因和后果。然而，这些信息往往是复杂的、冗余的和/或不完整的。此外，考虑到一家公司产生的大量报告，对整个数据库进行完整的人工审查是艰巨的。事实上，基于自然语言处理(NLP)的技术适合于分析大量的文本信息。在本文中，我们采用NLP技术来确定是否工伤假将预期从一个给定的事故报告。该方法应用于从一家实际的水力发电公司收集的事故报告，使用最先进的NLP方法——双向编码器表示(BERT)。BERT模型提供的文本表示与从事故报告中提取的数值变量和二进制变量相结合。这些组合变量被输入到多层感知器(MLP)中，该感知器预测给定事故的发生情况。经交叉验证，结果显示中位准确度为73.5%。此外，我们讨论了几个报告，提出了高和低比例的正确分类的模型测试，并讨论了可能的原因。事实上，事故调查报告为安全决策提供了有用的知识支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Institution of Mechanical Engineers Part O-Journal of Risk and Reliability ENGINEERING, MULTIDISCIPLINARY-ENGINEERING, INDUSTRIAL

CiteScore

4.50

自引率

19.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The Journal of Risk and Reliability is for researchers and practitioners who are involved in the field of risk analysis and reliability engineering. The remit of the Journal covers concepts, theories, principles, approaches, methods and models for the proper understanding, assessment, characterisation and management of the risk and reliability of engineering systems. The journal welcomes papers which are based on mathematical and probabilistic analysis, simulation and/or optimisation, as well as works highlighting conceptual and managerial issues. Papers that provide perspectives on current practices and methods, and how to improve these, are also welcome