测量不确定度对智能气体传感器机器学习结果的影响

IF 0.8 Q4 INSTRUMENTS & INSTRUMENTATION Journal of Sensors and Sensor Systems Pub Date : 2023-01-27 DOI:10.5194/jsss-12-45-2023

T. Dorst, T. Schneider, S. Eichstädt, A. Schütze

{"title":"测量不确定度对智能气体传感器机器学习结果的影响","authors":"T. Dorst, T. Schneider, S. Eichstädt, A. Schütze","doi":"10.5194/jsss-12-45-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Humans spend most of their lives indoors, so indoor air quality (IAQ) plays a key role in human health. Thus, human health is seriously threatened by indoor air pollution, which leads to 3.8 ×106 deaths annually, according to the World Health Organization (WHO). With the ongoing improvement in life quality, IAQ monitoring has become an important concern for researchers. However, in machine learning (ML), measurement uncertainty, which is critical in hazardous gas detection, is usually only estimated using cross-validation and is not directly addressed, and this will be the main focus of this paper. Gas concentration can be determined by using gas sensors in temperature-cycled operation (TCO) and ML on the measured logarithmic resistance of the sensor. This contribution focuses on formaldehyde as one of the most relevant carcinogenic gases indoors and on the sum of volatile organic compounds (VOCs), i.e., acetone, ethanol, formaldehyde, and toluene, measured in the data set as an indicator for IAQ. As gas concentrations are continuous quantities, regression must be used. Thus, a previously published uncertainty-aware automated ML toolbox (UA-AMLT) for classification is extended for regression by introducing an uncertainty-aware partial least squares regression (PLSR) algorithm. The uncertainty propagation of the UA-AMLT is based on the principles described in the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements. Two different use cases are considered for investigating the influence on ML results in this contribution, namely model training with raw data and with data that are manipulated by adding artificially generated white Gaussian or uniform noise to simulate increased data uncertainty, respectively. One of the benefits of this approach is to obtain a better understanding of where the overall system should be improved. This can be achieved by either improving the trained ML model or using a sensor with higher precision. Finally, an increase in robustness against random noise by training a model with noisy data is demonstrated.\n","PeriodicalId":17167,"journal":{"name":"Journal of Sensors and Sensor Systems","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Influence of measurement uncertainty on machine learning results demonstrated for a smart gas sensor\",\"authors\":\"T. Dorst, T. Schneider, S. Eichstädt, A. Schütze\",\"doi\":\"10.5194/jsss-12-45-2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract. Humans spend most of their lives indoors, so indoor air quality (IAQ) plays a key role in human health. Thus, human health is seriously threatened by indoor air pollution, which leads to 3.8 ×106 deaths annually, according to the World Health Organization (WHO). With the ongoing improvement in life quality, IAQ monitoring has become an important concern for researchers. However, in machine learning (ML), measurement uncertainty, which is critical in hazardous gas detection, is usually only estimated using cross-validation and is not directly addressed, and this will be the main focus of this paper. Gas concentration can be determined by using gas sensors in temperature-cycled operation (TCO) and ML on the measured logarithmic resistance of the sensor. This contribution focuses on formaldehyde as one of the most relevant carcinogenic gases indoors and on the sum of volatile organic compounds (VOCs), i.e., acetone, ethanol, formaldehyde, and toluene, measured in the data set as an indicator for IAQ. As gas concentrations are continuous quantities, regression must be used. Thus, a previously published uncertainty-aware automated ML toolbox (UA-AMLT) for classification is extended for regression by introducing an uncertainty-aware partial least squares regression (PLSR) algorithm. The uncertainty propagation of the UA-AMLT is based on the principles described in the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements. Two different use cases are considered for investigating the influence on ML results in this contribution, namely model training with raw data and with data that are manipulated by adding artificially generated white Gaussian or uniform noise to simulate increased data uncertainty, respectively. One of the benefits of this approach is to obtain a better understanding of where the overall system should be improved. This can be achieved by either improving the trained ML model or using a sensor with higher precision. Finally, an increase in robustness against random noise by training a model with noisy data is demonstrated.\\n\",\"PeriodicalId\":17167,\"journal\":{\"name\":\"Journal of Sensors and Sensor Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2023-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Sensors and Sensor Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5194/jsss-12-45-2023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"INSTRUMENTS & INSTRUMENTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sensors and Sensor Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/jsss-12-45-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}

引用次数: 2

摘要

摘要人类一生中大部分时间都在室内度过，因此室内空气质量对人类健康起着关键作用。因此，室内空气污染严重威胁人类健康，导致3.8 根据世界卫生组织（世界卫生组织）的数据，每年有106人死亡。随着生活质量的不断提高，室内空气质量监测已成为研究人员关注的一个重要问题。然而，在机器学习（ML）中，测量不确定性是危险气体检测中的关键，通常只使用交叉验证来估计，而不是直接解决，这将是本文的主要关注点。气体浓度可以通过在温度循环操作（TCO）中使用气体传感器和基于传感器的测量对数电阻的ML来确定。这一贡献侧重于甲醛作为室内最相关的致癌气体之一，以及数据集中作为室内空气质量指标测量的挥发性有机化合物（VOC）的总和，即丙酮、乙醇、甲醛和甲苯。由于气体浓度是连续量，因此必须使用回归。因此，通过引入不确定性感知偏最小二乘回归（PLSR）算法，将先前发布的用于分类的不确定性感知自动ML工具箱（UA-AMLT）扩展为回归。UA-AMLT的不确定度传播基于《测量不确定度表达指南》（GUM）及其补充文件中描述的原则。为了研究这一贡献对ML结果的影响，考虑了两种不同的用例，即分别使用原始数据和通过添加人工生成的白高斯或均匀噪声来模拟增加的数据不确定性的数据进行模型训练。这种方法的好处之一是更好地了解整个系统应该改进的地方。这可以通过改进训练的ML模型或使用具有更高精度的传感器来实现。最后，通过用噪声数据训练模型，证明了对随机噪声的鲁棒性的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Influence of measurement uncertainty on machine learning results demonstrated for a smart gas sensor

Abstract. Humans spend most of their lives indoors, so indoor air quality (IAQ) plays a key role in human health. Thus, human health is seriously threatened by indoor air pollution, which leads to 3.8 ×106 deaths annually, according to the World Health Organization (WHO). With the ongoing improvement in life quality, IAQ monitoring has become an important concern for researchers. However, in machine learning (ML), measurement uncertainty, which is critical in hazardous gas detection, is usually only estimated using cross-validation and is not directly addressed, and this will be the main focus of this paper. Gas concentration can be determined by using gas sensors in temperature-cycled operation (TCO) and ML on the measured logarithmic resistance of the sensor. This contribution focuses on formaldehyde as one of the most relevant carcinogenic gases indoors and on the sum of volatile organic compounds (VOCs), i.e., acetone, ethanol, formaldehyde, and toluene, measured in the data set as an indicator for IAQ. As gas concentrations are continuous quantities, regression must be used. Thus, a previously published uncertainty-aware automated ML toolbox (UA-AMLT) for classification is extended for regression by introducing an uncertainty-aware partial least squares regression (PLSR) algorithm. The uncertainty propagation of the UA-AMLT is based on the principles described in the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements. Two different use cases are considered for investigating the influence on ML results in this contribution, namely model training with raw data and with data that are manipulated by adding artificially generated white Gaussian or uniform noise to simulate increased data uncertainty, respectively. One of the benefits of this approach is to obtain a better understanding of where the overall system should be improved. This can be achieved by either improving the trained ML model or using a sensor with higher precision. Finally, an increase in robustness against random noise by training a model with noisy data is demonstrated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Sensors and Sensor Systems INSTRUMENTS & INSTRUMENTATION-

CiteScore

2.30

自引率

10.00%

发文量

审稿时长

23 weeks

期刊介绍： Journal of Sensors and Sensor Systems (JSSS) is an international open-access journal dedicated to science, application, and advancement of sensors and sensors as part of measurement systems. The emphasis is on sensor principles and phenomena, measuring systems, sensor technologies, and applications. The goal of JSSS is to provide a platform for scientists and professionals in academia – as well as for developers, engineers, and users – to discuss new developments and advancements in sensors and sensor systems.