参考数据中的 Aleatoric 和 Epistemic 误差对基于 NN 的势能曲面的可学性和质量的影响

Sugata Goswami , Silvan Käser , Raymond J. Bemish , Markus Meuwly
{"title":"参考数据中的 Aleatoric 和 Epistemic 误差对基于 NN 的势能曲面的可学性和质量的影响","authors":"Sugata Goswami ,&nbsp;Silvan Käser ,&nbsp;Raymond J. Bemish ,&nbsp;Markus Meuwly","doi":"10.1016/j.aichem.2023.100033","DOIUrl":null,"url":null,"abstract":"<div><p>The effect of noise in the input data for learning potential energy surfaces (PESs) based on neural networks for chemical applications is assessed. Noise in energies and forces can result from aleatoric and epistemic errors in the quantum chemical reference calculations. Statistical (aleatoric) noise arises for example due to the need to set convergence thresholds in the self consistent field (SCF) iterations whereas systematic (epistemic) noise is due to, <em>i</em>nter alia, particular choices of basis sets in the calculations. The two molecules considered here as proxies are H<sub>2</sub>CO and HONO which are examples for single- and multi-reference problems, respectively, for geometries around the minimum energy structure. For H<sub>2</sub>CO it is found that adding noise to energies and forces with magnitudes representative of single-point calculations does not deteriorate the quality of the final PESs whereas increasing the noise level commensurate with electronic structure calculations for more complicated, e.g. metal-containing, systems is expected to have a more notable effect. On the other hand, for HONO which requires a multi-reference treatment, a clear correlation between model quality and the degree of multi-reference character as measured by the <em>T</em><sub>1</sub> amplitude is found. It is concluded that for chemically “simple” cases the effect of aleatoric and epistemic errors is manageable without evident deterioration of the trained model, but more care needs to be exercised for situations in which multi-reference effects are present.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000337/pdfft?md5=391098ccf3759b129948054b61d9af08&pid=1-s2.0-S2949747723000337-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Effects of aleatoric and epistemic errors in reference data on the learnability and quality of NN-based potential energy surfaces\",\"authors\":\"Sugata Goswami ,&nbsp;Silvan Käser ,&nbsp;Raymond J. Bemish ,&nbsp;Markus Meuwly\",\"doi\":\"10.1016/j.aichem.2023.100033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The effect of noise in the input data for learning potential energy surfaces (PESs) based on neural networks for chemical applications is assessed. Noise in energies and forces can result from aleatoric and epistemic errors in the quantum chemical reference calculations. Statistical (aleatoric) noise arises for example due to the need to set convergence thresholds in the self consistent field (SCF) iterations whereas systematic (epistemic) noise is due to, <em>i</em>nter alia, particular choices of basis sets in the calculations. The two molecules considered here as proxies are H<sub>2</sub>CO and HONO which are examples for single- and multi-reference problems, respectively, for geometries around the minimum energy structure. For H<sub>2</sub>CO it is found that adding noise to energies and forces with magnitudes representative of single-point calculations does not deteriorate the quality of the final PESs whereas increasing the noise level commensurate with electronic structure calculations for more complicated, e.g. metal-containing, systems is expected to have a more notable effect. On the other hand, for HONO which requires a multi-reference treatment, a clear correlation between model quality and the degree of multi-reference character as measured by the <em>T</em><sub>1</sub> amplitude is found. It is concluded that for chemically “simple” cases the effect of aleatoric and epistemic errors is manageable without evident deterioration of the trained model, but more care needs to be exercised for situations in which multi-reference effects are present.</p></div>\",\"PeriodicalId\":72302,\"journal\":{\"name\":\"Artificial intelligence chemistry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949747723000337/pdfft?md5=391098ccf3759b129948054b61d9af08&pid=1-s2.0-S2949747723000337-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949747723000337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747723000337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文评估了输入数据中的噪声对基于神经网络的化学应用势能面(PES)学习的影响。量子化学参考计算中的估计误差和认识误差会导致能量和力的噪声。例如,由于需要在自洽场(SCF)迭代中设置收敛阈值,就会产生统计噪声;而系统噪声(认识噪声)则是由于计算中对基集的特定选择等原因造成的。这里考虑的两个分子是 H2CO 和 HONO,它们分别是最小能量结构附近几何形状的单参考和多参考问题的例子。对于 H2CO,研究发现,在单点计算的能量和作用力中加入噪声并不会降低最终 PES 的质量,而对于更复杂的系统(如含金属的系统),提高与电子结构计算相称的噪声水平预计会产生更显著的影响。另一方面,对于需要进行多参比处理的 HONO,发现模型质量与 T1 振幅衡量的多参比特征程度之间存在明显的相关性。结论是,对于化学性质 "简单 "的情况,可以处理已知误差和认识误差的影响,而不会明显恶化训练有素的模型,但对于存在多重参照效应的情况,则需要更加谨慎。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Effects of aleatoric and epistemic errors in reference data on the learnability and quality of NN-based potential energy surfaces

The effect of noise in the input data for learning potential energy surfaces (PESs) based on neural networks for chemical applications is assessed. Noise in energies and forces can result from aleatoric and epistemic errors in the quantum chemical reference calculations. Statistical (aleatoric) noise arises for example due to the need to set convergence thresholds in the self consistent field (SCF) iterations whereas systematic (epistemic) noise is due to, inter alia, particular choices of basis sets in the calculations. The two molecules considered here as proxies are H2CO and HONO which are examples for single- and multi-reference problems, respectively, for geometries around the minimum energy structure. For H2CO it is found that adding noise to energies and forces with magnitudes representative of single-point calculations does not deteriorate the quality of the final PESs whereas increasing the noise level commensurate with electronic structure calculations for more complicated, e.g. metal-containing, systems is expected to have a more notable effect. On the other hand, for HONO which requires a multi-reference treatment, a clear correlation between model quality and the degree of multi-reference character as measured by the T1 amplitude is found. It is concluded that for chemically “simple” cases the effect of aleatoric and epistemic errors is manageable without evident deterioration of the trained model, but more care needs to be exercised for situations in which multi-reference effects are present.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Artificial intelligence chemistry
Artificial intelligence chemistry Chemistry (General)
自引率
0.00%
发文量
0
审稿时长
21 days
期刊最新文献
Molecular similarity: Theory, applications, and perspectives Large-language models: The game-changers for materials science research Conf-GEM: A geometric information-assisted direct conformation generation model Top 20 influential AI-based technologies in chemistry User-friendly and industry-integrated AI for medicinal chemists and pharmaceuticals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1