Data augmentation using conditional generative adversarial network (cGAN): applications for sewer condition classification and testing using different machine learning techniques
{"title":"Data augmentation using conditional generative adversarial network (cGAN): applications for sewer condition classification and testing using different machine learning techniques","authors":"Haile Woldesellasse, Solomon Tesfamariam","doi":"10.2166/hydro.2024.135","DOIUrl":null,"url":null,"abstract":"\n The increasing availability of condition assessment data highlights the challenge of managing data imbalance in the asset management of aging infrastructure. Aging sewer pipes pose significant threats to health and the environment, underscoring the importance of proactive management practices to enhance asset maintenance and mitigate associated risks. While machine learning (ML) models are widely employed to model the complex deterioration process of sewer pipes, they face performance limitations when trained on imbalanced condition grade data. This paper addresses this issue by proposing a novel approach using conditional generative adversarial network (cGAN) for data augmentation. By generating synthetic data for minority classes, the skewed distribution of the sewer dataset is balanced, facilitating more robust and accurate predictive models. The utility of the proposed method is evaluated by training different ML classifiers, including neural network (NN), decision tree, quadratic discriminant analysis, Naïve Bayes, support vector machine (SVM), and K-nearest neighbor. Quadratic discriminant, Naïve Bayes, NN, and SVM classifiers demonstrated improvement. The cGAN-based data augmentation method also outperformed two other data imbalance handling techniques, random under-sampling, and cost-sensitive NN. Consequently, data generated by cGAN can effectively aid asset management by developing proactive classifiers that accurately predict pipes at a high risk of failure.","PeriodicalId":54801,"journal":{"name":"Journal of Hydroinformatics","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydroinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.2166/hydro.2024.135","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The increasing availability of condition assessment data highlights the challenge of managing data imbalance in the asset management of aging infrastructure. Aging sewer pipes pose significant threats to health and the environment, underscoring the importance of proactive management practices to enhance asset maintenance and mitigate associated risks. While machine learning (ML) models are widely employed to model the complex deterioration process of sewer pipes, they face performance limitations when trained on imbalanced condition grade data. This paper addresses this issue by proposing a novel approach using conditional generative adversarial network (cGAN) for data augmentation. By generating synthetic data for minority classes, the skewed distribution of the sewer dataset is balanced, facilitating more robust and accurate predictive models. The utility of the proposed method is evaluated by training different ML classifiers, including neural network (NN), decision tree, quadratic discriminant analysis, Naïve Bayes, support vector machine (SVM), and K-nearest neighbor. Quadratic discriminant, Naïve Bayes, NN, and SVM classifiers demonstrated improvement. The cGAN-based data augmentation method also outperformed two other data imbalance handling techniques, random under-sampling, and cost-sensitive NN. Consequently, data generated by cGAN can effectively aid asset management by developing proactive classifiers that accurately predict pipes at a high risk of failure.
期刊介绍:
Journal of Hydroinformatics is a peer-reviewed journal devoted to the application of information technology in the widest sense to problems of the aquatic environment. It promotes Hydroinformatics as a cross-disciplinary field of study, combining technological, human-sociological and more general environmental interests, including an ethical perspective.