Predicting possible recommendations related to causes and consequences in the HAZOP study worksheet using natural language processing and machine learning: BERT, clustering, and classification

IF 3.6 3区工程技术 Q2 ENGINEERING, CHEMICAL Journal of Loss Prevention in The Process Industries Pub Date : 2024-04-01 DOI:10.1016/j.jlp.2024.105310

Ali Ekramipooya , Mehrdad Boroushaki , Davood Rashtchian

{"title":"Predicting possible recommendations related to causes and consequences in the HAZOP study worksheet using natural language processing and machine learning: BERT, clustering, and classification","authors":"Ali Ekramipooya , Mehrdad Boroushaki , Davood Rashtchian","doi":"10.1016/j.jlp.2024.105310","DOIUrl":null,"url":null,"abstract":"<div><p>A set of recommendations is one of the most valuable outputs of the hazard and operability (HAZOP) study. The HAZOP study team provides recommendations when deficiencies are detected in the chemical process plant. These deficiencies can cause chemical process accidents and operability issues. This study employed a data-driven approach using natural language processing (NLP) and machine learning (ML) to predict potential recommendations based on causes and consequences. The dataset had no label; thus, clustering was used to label it. Firstly, bidirectional encoder representations from transformers (BERT) converted recommendation sentences into vectors. Secondly, uniform manifold approximation and projection (UMAP) and hierarchical density-based spatial clustering of applications with noise (HDBSCAN) were utilized to determine recommendation categories and label the dataset. Then, BERT was used to convert causes and consequences into vectors. Finally, a multi-layer perceptron (MLP) classifier was employed to predict possible recommendations based on causes and consequences. The class imbalance problem was handled by random over-sampling. The prediction accuracy of possible recommendations based on causes and consequences equals 93.7% and 89.5%, respectively. As a result of predicting potential recommendations utilizing causes and consequences, major recommendations will not be overlooked during the HAZOP study. This can further expand NLP and ML applications in HAZOP study automation.</p></div>","PeriodicalId":16291,"journal":{"name":"Journal of Loss Prevention in The Process Industries","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Loss Prevention in The Process Industries","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950423024000688","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}

引用次数: 0

Abstract

A set of recommendations is one of the most valuable outputs of the hazard and operability (HAZOP) study. The HAZOP study team provides recommendations when deficiencies are detected in the chemical process plant. These deficiencies can cause chemical process accidents and operability issues. This study employed a data-driven approach using natural language processing (NLP) and machine learning (ML) to predict potential recommendations based on causes and consequences. The dataset had no label; thus, clustering was used to label it. Firstly, bidirectional encoder representations from transformers (BERT) converted recommendation sentences into vectors. Secondly, uniform manifold approximation and projection (UMAP) and hierarchical density-based spatial clustering of applications with noise (HDBSCAN) were utilized to determine recommendation categories and label the dataset. Then, BERT was used to convert causes and consequences into vectors. Finally, a multi-layer perceptron (MLP) classifier was employed to predict possible recommendations based on causes and consequences. The class imbalance problem was handled by random over-sampling. The prediction accuracy of possible recommendations based on causes and consequences equals 93.7% and 89.5%, respectively. As a result of predicting potential recommendations utilizing causes and consequences, major recommendations will not be overlooked during the HAZOP study. This can further expand NLP and ML applications in HAZOP study automation.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用自然语言处理和机器学习，预测 HAZOP 研究工作表中与原因和后果相关的可能建议：BERT、聚类和分类

一套建议是危险与可操作性 (HAZOP) 研究最有价值的成果之一。HAZOP 研究小组会在发现化工工艺设备存在缺陷时提出建议。这些缺陷可能导致化学工艺事故和可操作性问题。本研究采用自然语言处理 (NLP) 和机器学习 (ML) 的数据驱动方法，根据原因和后果预测潜在的建议。该数据集没有标签，因此使用聚类来对其进行标记。首先，转换器的双向编码器表示法（BERT）将推荐句子转换成向量。其次，利用均匀流形近似和投影（UMAP）以及基于分层密度的带噪声应用空间聚类（HDBSCAN）来确定推荐类别并为数据集贴标签。然后，使用 BERT 将原因和结果转换为向量。最后，采用多层感知器（MLP）分类器根据前因后果预测可能的推荐。类不平衡问题是通过随机过度采样处理的。基于原因和结果的可能推荐的预测准确率分别为 93.7% 和 89.5%。利用前因后果预测潜在建议的结果是，在进行 HAZOP 研究时，主要建议不会被忽视。这将进一步扩大 NLP 和 ML 在 HAZOP 研究自动化中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Loss Prevention in The Process Industries 工程技术-工程：化工

CiteScore

7.20

自引率

14.30%

发文量

226

审稿时长

52 days

期刊介绍： The broad scope of the journal is process safety. Process safety is defined as the prevention and mitigation of process-related injuries and damage arising from process incidents involving fire, explosion and toxic release. Such undesired events occur in the process industries during the use, storage, manufacture, handling, and transportation of highly hazardous chemicals.