Similarity-Based Analysis of Atmospheric Organic Compounds for Machine Learning Applications

Hilda Sandström, Patrick Rinke
{"title":"Similarity-Based Analysis of Atmospheric Organic Compounds for Machine Learning Applications","authors":"Hilda Sandström, Patrick Rinke","doi":"arxiv-2406.18171","DOIUrl":null,"url":null,"abstract":"The formation of aerosol particles in the atmosphere impacts air quality and\nclimate change, but many of the organic molecules involved remain unknown.\nMachine learning could aid in identifying these compounds through accelerated\nanalysis of molecular properties and detection characteristics. However, such\nprogress is hindered by the current lack of curated datasets for atmospheric\nmolecules and their associated properties. To tackle this challenge, we propose\na similarity analysis that connects atmospheric compounds to existing large\nmolecular datasets used for machine learning development. We find a small\noverlap between atmospheric and non-atmospheric molecules using standard\nmolecular representations in machine learning applications. The identified\nout-of-domain character of atmospheric compounds is related to their distinct\nfunctional groups and atomic composition. Our investigation underscores the\nneed for collaborative efforts to gather and share more molecular-level\natmospheric chemistry data. The presented similarity based analysis can be used\nfor future dataset curation for machine learning development in the atmospheric\nsciences.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Data Analysis, Statistics and Probability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.18171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The formation of aerosol particles in the atmosphere impacts air quality and climate change, but many of the organic molecules involved remain unknown. Machine learning could aid in identifying these compounds through accelerated analysis of molecular properties and detection characteristics. However, such progress is hindered by the current lack of curated datasets for atmospheric molecules and their associated properties. To tackle this challenge, we propose a similarity analysis that connects atmospheric compounds to existing large molecular datasets used for machine learning development. We find a small overlap between atmospheric and non-atmospheric molecules using standard molecular representations in machine learning applications. The identified out-of-domain character of atmospheric compounds is related to their distinct functional groups and atomic composition. Our investigation underscores the need for collaborative efforts to gather and share more molecular-level atmospheric chemistry data. The presented similarity based analysis can be used for future dataset curation for machine learning development in the atmospheric sciences.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于相似性的大气有机化合物分析促进机器学习应用
大气中气溶胶粒子的形成会影响空气质量和气候变化,但其中涉及的许多有机分子仍不为人知。机器学习可以通过加速分析分子特性和检测特征来帮助识别这些化合物。然而,由于目前缺乏大气分子及其相关特性的数据集,这一进展受到了阻碍。为了应对这一挑战,我们提出了一种相似性分析方法,将大气化合物与用于机器学习开发的现有大分子数据集联系起来。我们发现,在机器学习应用中使用标准分子表征的大气分子和非大气分子之间存在微小的重叠。所发现的大气化合物的域外特征与其独特的功能基团和原子组成有关。我们的研究突出表明,需要共同努力收集和共享更多分子水平的大气化学数据。所介绍的基于相似性的分析可用于未来大气科学领域机器学习开发的数据集整理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PASS: An Asynchronous Probabilistic Processor for Next Generation Intelligence Astrometric Binary Classification Via Artificial Neural Networks XENONnT Analysis: Signal Reconstruction, Calibration and Event Selection Converting sWeights to Probabilities with Density Ratios Challenges and perspectives in recurrence analyses of event time series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1