QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules.

IF 7.2 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Scientific Data Pub Date : 2025-02-03 DOI:10.1038/s41597-024-04247-3
Vivin Vinod, Peter Zaspel
{"title":"QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules.","authors":"Vivin Vinod, Peter Zaspel","doi":"10.1038/s41597-024-04247-3","DOIUrl":null,"url":null,"abstract":"<p><p>Progress in both Machine Learning (ML) and Quantum Chemistry (QC) methods have resulted in high accuracy ML models for QC properties. Datasets such as MD17 and WS22 have been used to benchmark these models at a given level of QC method, or fidelity, which refers to the accuracy of the chosen QC method. Multifidelity ML (MFML) methods, where models are trained on data from more than one fidelity, have shown to be effective over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. One hurdle for effective research here is the lack of a diverse multifidelity dataset for benchmarking. We provide the Quantum chemistry MultiFidelity (QeMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. QeMFi offers to the community a variety of QC properties such as vertical excitation properties and molecular dipole moments. Further QeMFi offers QC computation times allowing for a time benefit benchmark of multifidelity models for ML-QC.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"202"},"PeriodicalIF":7.2000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11791055/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-04247-3","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Progress in both Machine Learning (ML) and Quantum Chemistry (QC) methods have resulted in high accuracy ML models for QC properties. Datasets such as MD17 and WS22 have been used to benchmark these models at a given level of QC method, or fidelity, which refers to the accuracy of the chosen QC method. Multifidelity ML (MFML) methods, where models are trained on data from more than one fidelity, have shown to be effective over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. One hurdle for effective research here is the lack of a diverse multifidelity dataset for benchmarking. We provide the Quantum chemistry MultiFidelity (QeMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. QeMFi offers to the community a variety of QC properties such as vertical excitation properties and molecular dipole moments. Further QeMFi offers QC computation times allowing for a time benefit benchmark of multifidelity models for ML-QC.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
QeMFi:不同分子量子化学性质的多保真度数据集。
机器学习(ML)和量子化学(QC)方法的进步导致了QC特性的高精度ML模型。MD17和WS22等数据集已被用于在给定的QC方法或保真度水平上对这些模型进行基准测试,保真度指的是所选QC方法的准确性。多保真度机器学习(MFML)方法,其中模型在来自多个保真度的数据上进行训练,已经证明比单一保真度方法更有效。从能带隙到激发能,在这个方向上有很多的研究进展。有效研究的一个障碍是缺乏多样化的多保真度数据集进行基准测试。我们提供了量子化学多保真度(QeMFi)数据集,该数据集由使用TD-DFT形式计算的五个保真度组成。保真度的基础设置选择不同:STO-3G、3-21G、6-31G、def2-SVP和def2-TZVP。QeMFi为社区提供了各种QC特性,如垂直激发特性和分子偶极矩。进一步的QeMFi提供QC计算时间,允许ML-QC的多保真模型的时间效益基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Scientific Data
Scientific Data Social Sciences-Education
CiteScore
11.20
自引率
4.10%
发文量
689
审稿时长
16 weeks
期刊介绍: Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.
期刊最新文献
High-Quality Genome Assembly of the White King Pigeon for Genetic Reference of Chinese Market Pigeon Breeds. Chinese provinces-embedded inter-country input-output datasets with firm ownership heterogeneity. A Fine-grained Spatiotemporal ECoG Dataset during Speech Perception in Tonal Language. Population-based Brain Templates for Ultra-Low-Field MRI. A global dataset of taxa to support calibration of invasion risk screening applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1