Tharunya Danabal, Neethi Sarah John, Abhijeet Pramod Ghawade, Pranjal Ahire
{"title":"基于自然语言处理的认知HSE风险预测与通知工具","authors":"Tharunya Danabal, Neethi Sarah John, Abhijeet Pramod Ghawade, Pranjal Ahire","doi":"10.2118/205877-ms","DOIUrl":null,"url":null,"abstract":"\n The focus of this work is on developing a cognitive tool that predicts the most frequent HSE hazards with the highest potential severity levels. The tool identifies these risks using a natural language processing algorithm on HSE leading and lagging indicator reports submitted to an oilfield services company’s global HSE reporting system. The purpose of the tool is to prioritize proactive actions and provide focus to raise workforce awareness.\n A natural language processing algorithm was developed to identify priority HSE risks based on potential severity levels and frequency of occurrence. The algorithm uses vectorization, compression, and clustering methods to categorize the risks by potential severity and frequency using a formulated risk index methodology. In the pilot study, a user interface was developed to configure the frequency and the number of the prioritized HSE risks that are to be communicated from the tool to those employees who opted to receive the information in a given location.\n From this pilot study using data reported in the company’s online HSE reporting system, the algorithm successfully identified five priority HSE risks across different hazard categories based on the risk index. Using a high volume of reporting data, the risk index factored multiple coefficients such as severity levels, frequency and cluster tightness to prioritize the HSE risks. The observations at each stage of the developed algorithm are as follows:In the data cleaning stage, all stop words (such as a, and, the) were removed, followed by tokenization to divide text in the HSE reports into tokens and remove punctuation.In the vectorization stage, many vectors were formed using the Term Frequency - Inverse Document Frequency (TF-IDF) method.In the compression stage, an autoencoder removed the noise from the input data.In the agglomerative clustering stage, HSE reports with similar words were grouped into clusters and the number of clusters generated per category were in the range of three to five.\n The novelty of this approach is its ability to prioritize a location’s HSE risks using an algorithm containing natural language processing techniques. This cognitive tool treats reported HSE information as data to identify and flag priority HSE risks factoring in the frequency of similar reports and their associated severity levels. The proof of concept has demonstrated the potential ability of the tool. The next stage would be to test predictive capabilities for injury prevention.","PeriodicalId":10965,"journal":{"name":"Day 3 Thu, September 23, 2021","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cognitive HSE Risk Prediction and Notification Tool Based on Natural Language Processing\",\"authors\":\"Tharunya Danabal, Neethi Sarah John, Abhijeet Pramod Ghawade, Pranjal Ahire\",\"doi\":\"10.2118/205877-ms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n The focus of this work is on developing a cognitive tool that predicts the most frequent HSE hazards with the highest potential severity levels. The tool identifies these risks using a natural language processing algorithm on HSE leading and lagging indicator reports submitted to an oilfield services company’s global HSE reporting system. The purpose of the tool is to prioritize proactive actions and provide focus to raise workforce awareness.\\n A natural language processing algorithm was developed to identify priority HSE risks based on potential severity levels and frequency of occurrence. The algorithm uses vectorization, compression, and clustering methods to categorize the risks by potential severity and frequency using a formulated risk index methodology. In the pilot study, a user interface was developed to configure the frequency and the number of the prioritized HSE risks that are to be communicated from the tool to those employees who opted to receive the information in a given location.\\n From this pilot study using data reported in the company’s online HSE reporting system, the algorithm successfully identified five priority HSE risks across different hazard categories based on the risk index. Using a high volume of reporting data, the risk index factored multiple coefficients such as severity levels, frequency and cluster tightness to prioritize the HSE risks. The observations at each stage of the developed algorithm are as follows:In the data cleaning stage, all stop words (such as a, and, the) were removed, followed by tokenization to divide text in the HSE reports into tokens and remove punctuation.In the vectorization stage, many vectors were formed using the Term Frequency - Inverse Document Frequency (TF-IDF) method.In the compression stage, an autoencoder removed the noise from the input data.In the agglomerative clustering stage, HSE reports with similar words were grouped into clusters and the number of clusters generated per category were in the range of three to five.\\n The novelty of this approach is its ability to prioritize a location’s HSE risks using an algorithm containing natural language processing techniques. This cognitive tool treats reported HSE information as data to identify and flag priority HSE risks factoring in the frequency of similar reports and their associated severity levels. The proof of concept has demonstrated the potential ability of the tool. The next stage would be to test predictive capabilities for injury prevention.\",\"PeriodicalId\":10965,\"journal\":{\"name\":\"Day 3 Thu, September 23, 2021\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Day 3 Thu, September 23, 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2118/205877-ms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Thu, September 23, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/205877-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cognitive HSE Risk Prediction and Notification Tool Based on Natural Language Processing
The focus of this work is on developing a cognitive tool that predicts the most frequent HSE hazards with the highest potential severity levels. The tool identifies these risks using a natural language processing algorithm on HSE leading and lagging indicator reports submitted to an oilfield services company’s global HSE reporting system. The purpose of the tool is to prioritize proactive actions and provide focus to raise workforce awareness.
A natural language processing algorithm was developed to identify priority HSE risks based on potential severity levels and frequency of occurrence. The algorithm uses vectorization, compression, and clustering methods to categorize the risks by potential severity and frequency using a formulated risk index methodology. In the pilot study, a user interface was developed to configure the frequency and the number of the prioritized HSE risks that are to be communicated from the tool to those employees who opted to receive the information in a given location.
From this pilot study using data reported in the company’s online HSE reporting system, the algorithm successfully identified five priority HSE risks across different hazard categories based on the risk index. Using a high volume of reporting data, the risk index factored multiple coefficients such as severity levels, frequency and cluster tightness to prioritize the HSE risks. The observations at each stage of the developed algorithm are as follows:In the data cleaning stage, all stop words (such as a, and, the) were removed, followed by tokenization to divide text in the HSE reports into tokens and remove punctuation.In the vectorization stage, many vectors were formed using the Term Frequency - Inverse Document Frequency (TF-IDF) method.In the compression stage, an autoencoder removed the noise from the input data.In the agglomerative clustering stage, HSE reports with similar words were grouped into clusters and the number of clusters generated per category were in the range of three to five.
The novelty of this approach is its ability to prioritize a location’s HSE risks using an algorithm containing natural language processing techniques. This cognitive tool treats reported HSE information as data to identify and flag priority HSE risks factoring in the frequency of similar reports and their associated severity levels. The proof of concept has demonstrated the potential ability of the tool. The next stage would be to test predictive capabilities for injury prevention.