风速和温室气体浓度测量高频时间序列异常检测的机器学习技术

IF 0.4 4区 物理与天体物理 Q4 PHYSICS, MULTIDISCIPLINARY Moscow University Physics Bulletin Pub Date : 2024-01-17 DOI:10.3103/S0027134923070135
A. J. Kasatkin, M. A. Krinitskiy
{"title":"风速和温室气体浓度测量高频时间序列异常检测的机器学习技术","authors":"A. J. Kasatkin,&nbsp;M. A. Krinitskiy","doi":"10.3103/S0027134923070135","DOIUrl":null,"url":null,"abstract":"<p>Fluxes of greenhouse gases (GHG) may be assessed in situ using the eddy covariance method through processing high-frequency measurements of gas concentration and wind speed acquired at certain sites, e.g., carbon measurement test areas of the pilot project of the Ministry of Education and Science of Russia. The measurements commonly come with noise, anomalies, and gaps of various natures. These anomalies result in biased GHG flux estimates. There are a number of empirical and heuristic approaches for filtering noise and anomalies, as well as for gap-filling. These approaches are characterized by many tuning parameters that are commonly adjusted by an expert, which is a limiting factor for large-scale deployment of GHG monitoring stations. In this study, we propose an alternative approach for anomaly detection in high-frequency measurements of GHG concentration and wind speed. Our approach is based on machine learning techniques. This approach is characterized by a lower number of tuning parameters. The goal of our study is to develop a fully automated data preprocessing routine based on machine learning algorithms. We collected the dataset of high-frequency GHG concentration and wind speed measurements from one of the carbon measurement test areas. In order to compare anomaly detection algorithms, we labeled anomalies in a subset of this dataset. We present two approaches for anomaly detection, namely: (a) identification of outliers based on the error magnitude in time series statistical forecasts performed by a machine learning (ML) algorithm; and (b) classification of anomalies using an ML model trained on the labeled dataset of outliers we mentioned above. We compared the approaches and algorithms based on the F1-score metric assessed with respect to an expert-labeled subset of anomalies in GHG concentration and wind speed time series. Within the forecast-error based approach, we trained several ML models: the ARIMA autoregression method, the CatBoost model for autoregression, the CatBoost model for forecasting employing additional features, and the LSTM artificial neural network. Within the supervised classification approach, we tested the CatBoost classification model. We demonstrate that ML models for forecasting deliver a high quality of time series prediction within the autoregression approach. We also show that the anomaly identification method based on the autoregression approach delivers the best quality with the F1-score reaching <span>\\(0.812\\)</span>.</p>","PeriodicalId":711,"journal":{"name":"Moscow University Physics Bulletin","volume":"78 1 supplement","pages":"S138 - S148"},"PeriodicalIF":0.4000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Techniques for Anomaly Detection in High-Frequency Time Series of Wind Speed and Greenhouse Gas Concentration Measurements\",\"authors\":\"A. J. Kasatkin,&nbsp;M. A. Krinitskiy\",\"doi\":\"10.3103/S0027134923070135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Fluxes of greenhouse gases (GHG) may be assessed in situ using the eddy covariance method through processing high-frequency measurements of gas concentration and wind speed acquired at certain sites, e.g., carbon measurement test areas of the pilot project of the Ministry of Education and Science of Russia. The measurements commonly come with noise, anomalies, and gaps of various natures. These anomalies result in biased GHG flux estimates. There are a number of empirical and heuristic approaches for filtering noise and anomalies, as well as for gap-filling. These approaches are characterized by many tuning parameters that are commonly adjusted by an expert, which is a limiting factor for large-scale deployment of GHG monitoring stations. In this study, we propose an alternative approach for anomaly detection in high-frequency measurements of GHG concentration and wind speed. Our approach is based on machine learning techniques. This approach is characterized by a lower number of tuning parameters. The goal of our study is to develop a fully automated data preprocessing routine based on machine learning algorithms. We collected the dataset of high-frequency GHG concentration and wind speed measurements from one of the carbon measurement test areas. In order to compare anomaly detection algorithms, we labeled anomalies in a subset of this dataset. We present two approaches for anomaly detection, namely: (a) identification of outliers based on the error magnitude in time series statistical forecasts performed by a machine learning (ML) algorithm; and (b) classification of anomalies using an ML model trained on the labeled dataset of outliers we mentioned above. We compared the approaches and algorithms based on the F1-score metric assessed with respect to an expert-labeled subset of anomalies in GHG concentration and wind speed time series. Within the forecast-error based approach, we trained several ML models: the ARIMA autoregression method, the CatBoost model for autoregression, the CatBoost model for forecasting employing additional features, and the LSTM artificial neural network. Within the supervised classification approach, we tested the CatBoost classification model. We demonstrate that ML models for forecasting deliver a high quality of time series prediction within the autoregression approach. We also show that the anomaly identification method based on the autoregression approach delivers the best quality with the F1-score reaching <span>\\\\(0.812\\\\)</span>.</p>\",\"PeriodicalId\":711,\"journal\":{\"name\":\"Moscow University Physics Bulletin\",\"volume\":\"78 1 supplement\",\"pages\":\"S138 - S148\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2024-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Moscow University Physics Bulletin\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0027134923070135\",\"RegionNum\":4,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Moscow University Physics Bulletin","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.3103/S0027134923070135","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

摘要利用涡度协方差法,通过处理在某些地点(如俄罗斯教育和科学部试点项目的碳测量试验区)获得的气体浓度和风速的高频测量数据,可以对温室气体(GHG)流量进行现场评估。这些测量结果通常带有噪音、异常和各种性质的间隙。这些异常现象会导致温室气体通量估计值出现偏差。有许多经验性和启发式方法可用于过滤噪声和异常,以及填补空白。这些方法的特点是有许多调整参数,通常由专家进行调整,这是大规模部署温室气体监测站的一个限制因素。在本研究中,我们提出了一种在温室气体浓度和风速的高频测量中进行异常检测的替代方法。我们的方法基于机器学习技术。这种方法的特点是调整参数数量较少。我们的研究目标是开发一种基于机器学习算法的全自动数据预处理程序。我们从一个碳测量测试区收集了高频温室气体浓度和风速测量数据集。为了比较异常检测算法,我们对该数据集中的一个子集进行了异常标注。我们提出了两种异常检测方法,即:(a) 根据机器学习(ML)算法在时间序列统计预测中的误差大小识别异常值;(b) 使用在上述异常值标注数据集上训练的 ML 模型对异常值进行分类。我们根据对专家标注的温室气体浓度和风速时间序列异常子集评估的 F1 分数指标,对各种方法和算法进行了比较。在基于预测误差的方法中,我们训练了多个 ML 模型:ARIMA 自回归方法、用于自回归的 CatBoost 模型、用于预测附加特征的 CatBoost 模型以及 LSTM 人工神经网络。在监督分类方法中,我们测试了 CatBoost 分类模型。我们证明,在自回归方法中,用于预测的 ML 模型可提供高质量的时间序列预测。我们还表明,基于自回归方法的异常识别方法质量最好,F1-分数达到了(0.812\)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine Learning Techniques for Anomaly Detection in High-Frequency Time Series of Wind Speed and Greenhouse Gas Concentration Measurements

Fluxes of greenhouse gases (GHG) may be assessed in situ using the eddy covariance method through processing high-frequency measurements of gas concentration and wind speed acquired at certain sites, e.g., carbon measurement test areas of the pilot project of the Ministry of Education and Science of Russia. The measurements commonly come with noise, anomalies, and gaps of various natures. These anomalies result in biased GHG flux estimates. There are a number of empirical and heuristic approaches for filtering noise and anomalies, as well as for gap-filling. These approaches are characterized by many tuning parameters that are commonly adjusted by an expert, which is a limiting factor for large-scale deployment of GHG monitoring stations. In this study, we propose an alternative approach for anomaly detection in high-frequency measurements of GHG concentration and wind speed. Our approach is based on machine learning techniques. This approach is characterized by a lower number of tuning parameters. The goal of our study is to develop a fully automated data preprocessing routine based on machine learning algorithms. We collected the dataset of high-frequency GHG concentration and wind speed measurements from one of the carbon measurement test areas. In order to compare anomaly detection algorithms, we labeled anomalies in a subset of this dataset. We present two approaches for anomaly detection, namely: (a) identification of outliers based on the error magnitude in time series statistical forecasts performed by a machine learning (ML) algorithm; and (b) classification of anomalies using an ML model trained on the labeled dataset of outliers we mentioned above. We compared the approaches and algorithms based on the F1-score metric assessed with respect to an expert-labeled subset of anomalies in GHG concentration and wind speed time series. Within the forecast-error based approach, we trained several ML models: the ARIMA autoregression method, the CatBoost model for autoregression, the CatBoost model for forecasting employing additional features, and the LSTM artificial neural network. Within the supervised classification approach, we tested the CatBoost classification model. We demonstrate that ML models for forecasting deliver a high quality of time series prediction within the autoregression approach. We also show that the anomaly identification method based on the autoregression approach delivers the best quality with the F1-score reaching \(0.812\).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Moscow University Physics Bulletin
Moscow University Physics Bulletin PHYSICS, MULTIDISCIPLINARY-
CiteScore
0.70
自引率
0.00%
发文量
129
审稿时长
6-12 weeks
期刊介绍: Moscow University Physics Bulletin publishes original papers (reviews, articles, and brief communications) in the following fields of experimental and theoretical physics: theoretical and mathematical physics; physics of nuclei and elementary particles; radiophysics, electronics, acoustics; optics and spectroscopy; laser physics; condensed matter physics; chemical physics, physical kinetics, and plasma physics; biophysics and medical physics; astronomy, astrophysics, and cosmology; physics of the Earth’s, atmosphere, and hydrosphere.
期刊最新文献
Influence of Dimensional Quantization Effects on the Effective Mass of Major Charge Carriers in LED Heterostructures with In\({}_{\boldsymbol{x}}\)Ga\({}_{\boldsymbol{1-x}}\)N/GaN Multiple Quantum Wells Is the Cyclic Model of the Universe Possible in the Relativistic Theory of Gravitation? Experimental Assessment of Magnetic Resonance Imaging Distortion for Radiation Therapy Planning Orbital and Spin Parts of Angular Momentum Flux Density of Monochromatic Radiation in Nonabsorbing Media with Nonlocal Nonlinear Optical Response Temperature Changes in Luminescence of Mixed Complexes of Terbium and Samarium with Organic Ligands Based on 2,2\({}^{\boldsymbol{\prime}}\)-bipyridylcarboxamides
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1