Sizenando Bispo-Silva, Cleverson J. Ferreira de Oliveira, Gabriel de Alemar Barberes
{"title":"Geochemical Biodegraded Oil Classification Using a Machine Learning Approach","authors":"Sizenando Bispo-Silva, Cleverson J. Ferreira de Oliveira, Gabriel de Alemar Barberes","doi":"10.3390/geosciences13110321","DOIUrl":null,"url":null,"abstract":"Chromatographic oil analysis is an important step for the identification of biodegraded petroleum via peak visualization and interpretation of phenomena that explain the oil geochemistry. However, analyses of chromatogram components by geochemists are comparative, visual, and consequently slow. This article aims to improve the chromatogram analysis process performed during geochemical interpretation by proposing the use of Convolutional Neural Networks (CNN), which are deep learning techniques widely used by big tech companies. Two hundred and twenty-one chromatographic oil images from different worldwide basins (Brazil, the USA, Portugal, Angola, and Venezuela) were used. The open-source software Orange Data Mining was used to process images by CNN. The CNN algorithm extracts, pixel by pixel, recurring features from the images through convolutional operations. Subsequently, the recurring features are grouped into common feature groups. The training result obtained an accuracy (CA) of 96.7% and an area under the ROC (Receiver Operating Characteristic) curve (AUC) of 99.7%. In turn, the test result obtained a 97.6% CA and a 99.7% AUC. This work suggests that the processing of petroleum chromatographic images through CNN can become a new tool for the study of petroleum geochemistry since the chromatograms can be loaded, read, grouped, and classified more efficiently and quickly than the evaluations applied in classical methods.","PeriodicalId":38189,"journal":{"name":"Geosciences (Switzerland)","volume":"49 4","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geosciences (Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/geosciences13110321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Chromatographic oil analysis is an important step for the identification of biodegraded petroleum via peak visualization and interpretation of phenomena that explain the oil geochemistry. However, analyses of chromatogram components by geochemists are comparative, visual, and consequently slow. This article aims to improve the chromatogram analysis process performed during geochemical interpretation by proposing the use of Convolutional Neural Networks (CNN), which are deep learning techniques widely used by big tech companies. Two hundred and twenty-one chromatographic oil images from different worldwide basins (Brazil, the USA, Portugal, Angola, and Venezuela) were used. The open-source software Orange Data Mining was used to process images by CNN. The CNN algorithm extracts, pixel by pixel, recurring features from the images through convolutional operations. Subsequently, the recurring features are grouped into common feature groups. The training result obtained an accuracy (CA) of 96.7% and an area under the ROC (Receiver Operating Characteristic) curve (AUC) of 99.7%. In turn, the test result obtained a 97.6% CA and a 99.7% AUC. This work suggests that the processing of petroleum chromatographic images through CNN can become a new tool for the study of petroleum geochemistry since the chromatograms can be loaded, read, grouped, and classified more efficiently and quickly than the evaluations applied in classical methods.
石油色谱分析是识别生物降解石油的重要步骤,通过峰可视化和解释现象来解释石油的地球化学特征。然而,地球化学家对色谱成分的分析是比较的、直观的,因此速度很慢。本文旨在通过提出使用卷积神经网络(CNN)来改进地球化学解释过程中的色谱分析过程,卷积神经网络是大型科技公司广泛使用的深度学习技术。使用了来自全球不同盆地(巴西、美国、葡萄牙、安哥拉和委内瑞拉)的221张色谱油图像。CNN使用开源软件Orange Data Mining来处理图像。CNN算法通过卷积运算从图像中逐像素提取重复特征。随后,将重复出现的特性分组到公共特性组中。训练结果准确率(CA)为96.7%,ROC曲线下面积(AUC)为99.7%。反过来,测试结果获得97.6%的CA和99.7%的AUC。该研究表明,与传统的评价方法相比,利用CNN对石油色谱图像进行加载、读取、分组和分类的效率更高,可以成为石油地球化学研究的一种新工具。