用机器学习预测大气分子的气粒分配系数

Emma Lumiaro, M. Todorovi'c, T. Kurtén, H. Vehkamäki, P. Rinke
{"title":"用机器学习预测大气分子的气粒分配系数","authors":"Emma Lumiaro, M. Todorovi'c, T. Kurtén, H. Vehkamäki, P. Rinke","doi":"10.5194/ACP-2020-1258","DOIUrl":null,"url":null,"abstract":"Abstract. The formation, properties and lifetime of secondary organic aerosols in the atmosphere are largely determined by gas-particle partitioning coefficients of the participating organic vapours. Since these coefficients are often difficult to measure and to compute, we developed a machine learning model to predict them given molecular structure as input. Our data-driven approach is based on the dataset by Wang et al. (Atmos. Chem. Phys., 17, 7529 (2017)), who computed the partitioning coefficients and saturation vapour pressures of 3414 atmospheric oxidation products from the master chemical mechanism using the COSMOtherm program. We trained a kernel ridge regression (KRR) machine learning model on the saturation vapour pressure (Psat), and on two equilibrium partitioning coefficients: between a water-insoluble organic matter phase and the gas phase (KWIOM/G), and between an infinitely dilute solution with pure water and the gas phase (KW/G). For the input representation of the atomic structure of each organic molecule to the machine, we tested different descriptors. We find that the many-body tensor representation (MBTR) works best for our application, but the topological fingerprint (TopFP) approach is almost as good, and is significantly more cost effective. Our best machine learning model (KRR with a Gaussian kernel + MBTR) predicts Psat and KWIOM/G to within 0.3 logarithmic units and KW/G to within 0.4 logarithmic units of the original COSMOtherm calculations. This is equal or better than the typical accuracy of COSMOtherm predictions compared to experimental data (where available). We then applied our machine learning model to a dataset of 35,383 molecules that we generated based on a carbon 10 backbone functionalized with 0 to 6 carboxyl, carbonyl or hydroxyl groups to evaluate its performance for polyfunctional compounds with potentially low Psat. The resulting saturation vapor pressure and partitioning coefficient distributions were physico-chemically reasonable, and the volatility predictions for the most highly oxidized compounds were in qualitative agreement with experimentally inferred volatilities of atmospheric oxidation products with similar elemental composition.\n","PeriodicalId":8439,"journal":{"name":"arXiv: Chemical Physics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Predicting Gas-Particle Partitioning Coefficients of Atmospheric\\nMolecules with Machine Learning\",\"authors\":\"Emma Lumiaro, M. Todorovi'c, T. Kurtén, H. Vehkamäki, P. Rinke\",\"doi\":\"10.5194/ACP-2020-1258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract. The formation, properties and lifetime of secondary organic aerosols in the atmosphere are largely determined by gas-particle partitioning coefficients of the participating organic vapours. Since these coefficients are often difficult to measure and to compute, we developed a machine learning model to predict them given molecular structure as input. Our data-driven approach is based on the dataset by Wang et al. (Atmos. Chem. Phys., 17, 7529 (2017)), who computed the partitioning coefficients and saturation vapour pressures of 3414 atmospheric oxidation products from the master chemical mechanism using the COSMOtherm program. We trained a kernel ridge regression (KRR) machine learning model on the saturation vapour pressure (Psat), and on two equilibrium partitioning coefficients: between a water-insoluble organic matter phase and the gas phase (KWIOM/G), and between an infinitely dilute solution with pure water and the gas phase (KW/G). For the input representation of the atomic structure of each organic molecule to the machine, we tested different descriptors. We find that the many-body tensor representation (MBTR) works best for our application, but the topological fingerprint (TopFP) approach is almost as good, and is significantly more cost effective. Our best machine learning model (KRR with a Gaussian kernel + MBTR) predicts Psat and KWIOM/G to within 0.3 logarithmic units and KW/G to within 0.4 logarithmic units of the original COSMOtherm calculations. This is equal or better than the typical accuracy of COSMOtherm predictions compared to experimental data (where available). We then applied our machine learning model to a dataset of 35,383 molecules that we generated based on a carbon 10 backbone functionalized with 0 to 6 carboxyl, carbonyl or hydroxyl groups to evaluate its performance for polyfunctional compounds with potentially low Psat. The resulting saturation vapor pressure and partitioning coefficient distributions were physico-chemically reasonable, and the volatility predictions for the most highly oxidized compounds were in qualitative agreement with experimentally inferred volatilities of atmospheric oxidation products with similar elemental composition.\\n\",\"PeriodicalId\":8439,\"journal\":{\"name\":\"arXiv: Chemical Physics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv: Chemical Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5194/ACP-2020-1258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Chemical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/ACP-2020-1258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

摘要大气中二次有机气溶胶的形成、性质和寿命在很大程度上取决于参与的有机蒸汽的气粒分配系数。由于这些系数通常难以测量和计算,我们开发了一个机器学习模型,以分子结构作为输入来预测它们。我们的数据驱动方法基于Wang等人的数据集。化学。理论物理。利用COSMOtherm程序计算了主化学机制中3414种大气氧化产物的分配系数和饱和蒸汽压。我们在饱和蒸汽压(Psat)和两个平衡分配系数上训练了一个核脊回归(KRR)机器学习模型:水不溶性有机物相和气相之间的平衡分配系数(KWIOM/G),以及无限稀释纯水溶液和气相之间的平衡分配系数(KW/G)。对于每个有机分子的原子结构的输入表示,我们测试了不同的描述符。我们发现多体张量表示(MBTR)最适合我们的应用,但拓扑指纹(TopFP)方法几乎同样好,而且成本效益显著提高。我们最好的机器学习模型(带有高斯核的KRR + MBTR)预测Psat和KWIOM/G在原始COSMOtherm计算的0.3个对数单位内,KW/G在0.4个对数单位内。与实验数据相比,这等于或优于宇宙热预测的典型准确性(如果有的话)。然后,我们将我们的机器学习模型应用于35,383个分子的数据集,这些分子是基于碳10骨架被0到6个羧基、羰基或羟基功能化而生成的,以评估其对具有潜在低Psat的多功能化合物的性能。得到的饱和蒸汽压和分配系数分布在物理化学上是合理的,对大多数高度氧化化合物的挥发性预测与实验推断的具有相似元素组成的大气氧化产物的挥发性在定性上是一致的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting Gas-Particle Partitioning Coefficients of Atmospheric Molecules with Machine Learning
Abstract. The formation, properties and lifetime of secondary organic aerosols in the atmosphere are largely determined by gas-particle partitioning coefficients of the participating organic vapours. Since these coefficients are often difficult to measure and to compute, we developed a machine learning model to predict them given molecular structure as input. Our data-driven approach is based on the dataset by Wang et al. (Atmos. Chem. Phys., 17, 7529 (2017)), who computed the partitioning coefficients and saturation vapour pressures of 3414 atmospheric oxidation products from the master chemical mechanism using the COSMOtherm program. We trained a kernel ridge regression (KRR) machine learning model on the saturation vapour pressure (Psat), and on two equilibrium partitioning coefficients: between a water-insoluble organic matter phase and the gas phase (KWIOM/G), and between an infinitely dilute solution with pure water and the gas phase (KW/G). For the input representation of the atomic structure of each organic molecule to the machine, we tested different descriptors. We find that the many-body tensor representation (MBTR) works best for our application, but the topological fingerprint (TopFP) approach is almost as good, and is significantly more cost effective. Our best machine learning model (KRR with a Gaussian kernel + MBTR) predicts Psat and KWIOM/G to within 0.3 logarithmic units and KW/G to within 0.4 logarithmic units of the original COSMOtherm calculations. This is equal or better than the typical accuracy of COSMOtherm predictions compared to experimental data (where available). We then applied our machine learning model to a dataset of 35,383 molecules that we generated based on a carbon 10 backbone functionalized with 0 to 6 carboxyl, carbonyl or hydroxyl groups to evaluate its performance for polyfunctional compounds with potentially low Psat. The resulting saturation vapor pressure and partitioning coefficient distributions were physico-chemically reasonable, and the volatility predictions for the most highly oxidized compounds were in qualitative agreement with experimentally inferred volatilities of atmospheric oxidation products with similar elemental composition.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Flexible model of water based on the dielectric and electromagnetic spectrum properties : TIP4P/$\epsilon$ Flex. Characterization of a Modular Flow Cell System for Electrocatalytic Experiments and Comparison to a Commercial RRDE System Predicting Gas-Particle Partitioning Coefficients of Atmospheric Molecules with Machine Learning Electron-stimulated desorption from molecular ices in the 0.15–2 keV regime (15‐crown‐5)BiI 3 as a Building Block for Halogen Bonded Supramolecular Aggregates
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1