{"title":"基于生成对抗网络(gan)的油气管道突发失效风险分析综合数据生成","authors":"R. K. Mazumder, Gourav Modanwal, Yue Li","doi":"10.1115/1.4062741","DOIUrl":null,"url":null,"abstract":"\n Despite the pipeline network being the safest mode of oil and gas transportation systems, the pipeline failure rate has increased significantly over the last decade, particularly for aging pipelines. Predicting failure risk and prioritizing the riskiest asset from a large set of pipelines is one of the demanding tasks for the utilities. Machine Learning (ML) application in pipeline failure risk prediction has recently shown promising results. However, due to safety and security concerns, obtaining sufficient operation and failure data to train ML models accurately is a significant challenge. This study employed a Generative Adversarial Network (GAN) based framework to generate synthetic pipeline data (DSyn, N=100) based on a subset (70%) of experimental burst test results data (DExp) compiled from the literature (N= 92) to overcome the limitation of accessing operational data. The proposed framework was tested on (1) real data, and (2) combined real and generated synthetic data. The burst failure risk of corroded oil and gas pipelines was determined using probabilistic approaches, and pipelines were classified into two classes: (1) low risk (pf:0-0.5) and (2) high risk (pf:>0.5). Two Random Forest (RF) models (MExp and MComb) were trained using a subset of actual experimental pipeline data (DExp, N=64) and combined data (DExp + DSyn, N=164). These models were validated on the remaining subset (30%) of experimental test data (N=28). The validation results reveal that adding synthetic data can further improve the performance of the ML models. The area under the ROC Curve was found to be 0.96 and 0.99 for real model (MExp) and combined model (MComb) data, respectively. The combined model with improved performance can be used in strategic oil and gas pipeline resilience improvement planning, which sets long-term critical decisions regarding maintenance and potential replacement of pipes.","PeriodicalId":44694,"journal":{"name":"ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems Part B-Mechanical Engineering","volume":"1 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic Data Generation Using Generative Adversarial Network (gan) for Burst Failure Risk Analysis of Oil and Gas Pipelines\",\"authors\":\"R. K. Mazumder, Gourav Modanwal, Yue Li\",\"doi\":\"10.1115/1.4062741\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Despite the pipeline network being the safest mode of oil and gas transportation systems, the pipeline failure rate has increased significantly over the last decade, particularly for aging pipelines. Predicting failure risk and prioritizing the riskiest asset from a large set of pipelines is one of the demanding tasks for the utilities. Machine Learning (ML) application in pipeline failure risk prediction has recently shown promising results. However, due to safety and security concerns, obtaining sufficient operation and failure data to train ML models accurately is a significant challenge. This study employed a Generative Adversarial Network (GAN) based framework to generate synthetic pipeline data (DSyn, N=100) based on a subset (70%) of experimental burst test results data (DExp) compiled from the literature (N= 92) to overcome the limitation of accessing operational data. The proposed framework was tested on (1) real data, and (2) combined real and generated synthetic data. The burst failure risk of corroded oil and gas pipelines was determined using probabilistic approaches, and pipelines were classified into two classes: (1) low risk (pf:0-0.5) and (2) high risk (pf:>0.5). Two Random Forest (RF) models (MExp and MComb) were trained using a subset of actual experimental pipeline data (DExp, N=64) and combined data (DExp + DSyn, N=164). These models were validated on the remaining subset (30%) of experimental test data (N=28). The validation results reveal that adding synthetic data can further improve the performance of the ML models. The area under the ROC Curve was found to be 0.96 and 0.99 for real model (MExp) and combined model (MComb) data, respectively. The combined model with improved performance can be used in strategic oil and gas pipeline resilience improvement planning, which sets long-term critical decisions regarding maintenance and potential replacement of pipes.\",\"PeriodicalId\":44694,\"journal\":{\"name\":\"ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems Part B-Mechanical Engineering\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems Part B-Mechanical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1115/1.4062741\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems Part B-Mechanical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1115/1.4062741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Synthetic Data Generation Using Generative Adversarial Network (gan) for Burst Failure Risk Analysis of Oil and Gas Pipelines
Despite the pipeline network being the safest mode of oil and gas transportation systems, the pipeline failure rate has increased significantly over the last decade, particularly for aging pipelines. Predicting failure risk and prioritizing the riskiest asset from a large set of pipelines is one of the demanding tasks for the utilities. Machine Learning (ML) application in pipeline failure risk prediction has recently shown promising results. However, due to safety and security concerns, obtaining sufficient operation and failure data to train ML models accurately is a significant challenge. This study employed a Generative Adversarial Network (GAN) based framework to generate synthetic pipeline data (DSyn, N=100) based on a subset (70%) of experimental burst test results data (DExp) compiled from the literature (N= 92) to overcome the limitation of accessing operational data. The proposed framework was tested on (1) real data, and (2) combined real and generated synthetic data. The burst failure risk of corroded oil and gas pipelines was determined using probabilistic approaches, and pipelines were classified into two classes: (1) low risk (pf:0-0.5) and (2) high risk (pf:>0.5). Two Random Forest (RF) models (MExp and MComb) were trained using a subset of actual experimental pipeline data (DExp, N=64) and combined data (DExp + DSyn, N=164). These models were validated on the remaining subset (30%) of experimental test data (N=28). The validation results reveal that adding synthetic data can further improve the performance of the ML models. The area under the ROC Curve was found to be 0.96 and 0.99 for real model (MExp) and combined model (MComb) data, respectively. The combined model with improved performance can be used in strategic oil and gas pipeline resilience improvement planning, which sets long-term critical decisions regarding maintenance and potential replacement of pipes.