{"title":"A machine learning approach to quality-control Argo temperature data","authors":"Qi Zhang , Chenyan Qian , Changming Dong","doi":"10.1016/j.aosl.2022.100292","DOIUrl":null,"url":null,"abstract":"<div><p>A machine learning approach is proposed to identify temperature outliers from Argo float profiles as a complementary procedure to current Argo quality control. A machine learning unsupervised classification (i.e., the Gaussian mixture model, GMM) is applied to cluster the Argo data into classes to construct convex hulls with the smallest polygons encompassing all the data points. Good or bad temperature data are identified as within or outside the polygons based on point-in-polygon analysis implemented by the ray casting algorithm. The South China Sea was selected as an example and results showed that the proposed approach could identify more than 70% of the profiles containing the outliers and mark the outliers automatically at the same time. This highlights the potential of the proposed methodology to be a good complementary quality control method.</p><p>摘要</p><p>本文提出了一种基于机器学习的Argo浮标温度异常值检测方法. 该方法采用机器学习无监督算法高斯混合模型对Argo浮标数据进行聚类分析, 并构建包围所有数据点的最小多边形的凸包. 基于射线投影算法实现点在多边形内分析, 通过自动识别数据点位于凸包内外来判断该数据点数据质量的好坏. 本文采用南海区域Argo浮标数据对该方法进行测试, 结果表明该方法可以识别70%以上的包含异常值的温度剖面, 同时自动标记出各异常值点.</p></div>","PeriodicalId":47210,"journal":{"name":"Atmospheric and Oceanic Science Letters","volume":"16 4","pages":"Article 100292"},"PeriodicalIF":2.3000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric and Oceanic Science Letters","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1674283422001751","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 3
Abstract
A machine learning approach is proposed to identify temperature outliers from Argo float profiles as a complementary procedure to current Argo quality control. A machine learning unsupervised classification (i.e., the Gaussian mixture model, GMM) is applied to cluster the Argo data into classes to construct convex hulls with the smallest polygons encompassing all the data points. Good or bad temperature data are identified as within or outside the polygons based on point-in-polygon analysis implemented by the ray casting algorithm. The South China Sea was selected as an example and results showed that the proposed approach could identify more than 70% of the profiles containing the outliers and mark the outliers automatically at the same time. This highlights the potential of the proposed methodology to be a good complementary quality control method.
A machine learning approach is proposed to identify temperature outliers from Argo float profiles as a complementary procedure to current Argo quality control. A machine learning unsupervised classification (i.e., the Gaussian mixture model, GMM) is applied to cluster the Argo data into classes to construct convex hulls with the smallest polygons encompassing all the data points. Good or bad temperature data are identified as within or outside the polygons based on point-in-polygon analysis implemented by the ray casting algorithm. The South China Sea was selected as an example and results showed that the proposed approach could identify more than 70% of the profiles containing the outliers and mark the outliers automatically at the same time. This highlights the potential of the proposed methodology to be a good complementary quality control method.摘要本文提出了一种基于机器学习的Argo浮标温度异常值检测方法. 该方法采用机器学习无监督算法高斯混合模型对Argo浮标数据进行聚类分析, 并构建包围所有数据点的最小多边形的凸包. 基于射线投影算法实现点在多边形内分析, 通过自动识别数据点位于凸包内外来判断该数据点数据质量的好坏. 本文采用南海区域Argo浮标数据对该方法进行测试, 结果表明该方法可以识别70%以上的包含异常值的温度剖面, 同时自动标记出各异常值点.