Combining BERT with numerical variables to classify injury leave based on accident description

Plínio MS Ramos, J. Macedo, Caio BS Maior, M. Moura, I. Lins
{"title":"Combining BERT with numerical variables to classify injury leave based on accident description","authors":"Plínio MS Ramos, J. Macedo, Caio BS Maior, M. Moura, I. Lins","doi":"10.1177/1748006x221140194","DOIUrl":null,"url":null,"abstract":"The occurrence of work accidents may threaten the workers’ health and lead to consequences for the organizations as well, such as restructuring of work and direct/indirect costs with the absence of the worker. In this context, accident investigation reports contain information that can support companies to propose preventive and mitigative measures and identify causes and consequences of injury events. However, this information is frequently complex, redundant, and/or incomplete. Additionally, a complete human review of the entire database is arduous, considering numerous reports produced by a company. Indeed, Natural Language Processing (NLP)-based techniques are suitable for analyzing a massive amount of textual information. In this paper, we adopted NLP techniques to determine whether an injury leave would be expected from a given accident report. The methodology was applied to accident reports collected from an actual hydroelectric power company using Bidirectional Encoder Representations from Transformers (BERT), a state-of-art NLP method. The text representations provided by BERT model were combined with numerical and binary variables extracted from the accident reports. These combined variables are input to a Multilayer Perceptron (MLP) that predicts the occurrence of the accident leave for a given accident. After cross-validation, the results showed a median accuracy of 73.5%. Additionally, we discuss several reports that presented high and low proportions of correct classifications by the models tested and discussed the possible reasons. Indeed, accident investigation reports provide useful knowledge to support decisions in the safety context.","PeriodicalId":51266,"journal":{"name":"Proceedings of the Institution of Mechanical Engineers Part O-Journal of Risk and Reliability","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Institution of Mechanical Engineers Part O-Journal of Risk and Reliability","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1177/1748006x221140194","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 1

Abstract

The occurrence of work accidents may threaten the workers’ health and lead to consequences for the organizations as well, such as restructuring of work and direct/indirect costs with the absence of the worker. In this context, accident investigation reports contain information that can support companies to propose preventive and mitigative measures and identify causes and consequences of injury events. However, this information is frequently complex, redundant, and/or incomplete. Additionally, a complete human review of the entire database is arduous, considering numerous reports produced by a company. Indeed, Natural Language Processing (NLP)-based techniques are suitable for analyzing a massive amount of textual information. In this paper, we adopted NLP techniques to determine whether an injury leave would be expected from a given accident report. The methodology was applied to accident reports collected from an actual hydroelectric power company using Bidirectional Encoder Representations from Transformers (BERT), a state-of-art NLP method. The text representations provided by BERT model were combined with numerical and binary variables extracted from the accident reports. These combined variables are input to a Multilayer Perceptron (MLP) that predicts the occurrence of the accident leave for a given accident. After cross-validation, the results showed a median accuracy of 73.5%. Additionally, we discuss several reports that presented high and low proportions of correct classifications by the models tested and discussed the possible reasons. Indeed, accident investigation reports provide useful knowledge to support decisions in the safety context.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合BERT和数值变量对基于事故描述的工伤假进行分类
工作事故的发生可能威胁到工人的健康,也会给组织带来后果,例如工作结构调整和工人缺勤造成的直接/间接成本。在这种情况下,事故调查报告包含的信息可以支持公司提出预防和缓解措施,并确定伤害事件的原因和后果。然而,这些信息往往是复杂的、冗余的和/或不完整的。此外,考虑到一家公司产生的大量报告,对整个数据库进行完整的人工审查是艰巨的。事实上,基于自然语言处理(NLP)的技术适合于分析大量的文本信息。在本文中,我们采用NLP技术来确定是否工伤假将预期从一个给定的事故报告。该方法应用于从一家实际的水力发电公司收集的事故报告,使用最先进的NLP方法——双向编码器表示(BERT)。BERT模型提供的文本表示与从事故报告中提取的数值变量和二进制变量相结合。这些组合变量被输入到多层感知器(MLP)中,该感知器预测给定事故的发生情况。经交叉验证,结果显示中位准确度为73.5%。此外,我们讨论了几个报告,提出了高和低比例的正确分类的模型测试,并讨论了可能的原因。事实上,事故调查报告为安全决策提供了有用的知识支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.50
自引率
19.00%
发文量
81
审稿时长
6-12 weeks
期刊介绍: The Journal of Risk and Reliability is for researchers and practitioners who are involved in the field of risk analysis and reliability engineering. The remit of the Journal covers concepts, theories, principles, approaches, methods and models for the proper understanding, assessment, characterisation and management of the risk and reliability of engineering systems. The journal welcomes papers which are based on mathematical and probabilistic analysis, simulation and/or optimisation, as well as works highlighting conceptual and managerial issues. Papers that provide perspectives on current practices and methods, and how to improve these, are also welcome
期刊最新文献
Spare parts provisioning strategy of warranty repair demands for capital-intensive products Integrated testability modeling method of complex systems for fault feature selection and diagnosis strategy optimization Risk analysis of accident-causing evolution in chemical laboratory based on complex network Small-sample health indicator construction of rolling bearings with wavelet scattering network: An empirical study from frequency perspective Editoral on special issue “Text mining applied to risk analysis, maintenance and safety”
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1