Mingzhi Yuan, Zihan Zou, Yi Luo, Jun Jiang, Wei Hu
{"title":"QMe14S: A Comprehensive and Efficient Spectral Data Set for Small Organic Molecules","authors":"Mingzhi Yuan, Zihan Zou, Yi Luo, Jun Jiang, Wei Hu","doi":"10.1021/acs.jpclett.5c00839","DOIUrl":null,"url":null,"abstract":"Developing machine learning protocols for molecular simulations requires comprehensive and efficient data sets. Here we introduce the QMe14S data set, comprising 186,102 small organic molecules featuring 14 elements (H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, and Br) and 47 functional groups. Using density functional theory at the B3LYP/TZVP level, we optimized the geometries and calculated properties, including energy, atomic charge, atomic force, dipole moment, quadrupole moment, polarizability, octupole moment, first hyperpolarizability, and Hessian. At the same level, we obtained the harmonic IR, Raman, and NMR spectra. Furthermore, we conducted ab initio molecular dynamics simulations to generate dynamic configurations and extract nonequilibrium properties, including energy, forces, and Hessians. By leveraging our E(3)-equivariant message-passing neural network (DetaNet), we demonstrated that models trained on QMe14S outperform those trained on the previously developed QM9S data set in simulating molecular spectra. The QMe14S data set thus serves as a comprehensive benchmark for molecular simulations, offering valuable insights into structure–property relationships.","PeriodicalId":62,"journal":{"name":"The Journal of Physical Chemistry Letters","volume":"218 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry Letters","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.jpclett.5c00839","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Developing machine learning protocols for molecular simulations requires comprehensive and efficient data sets. Here we introduce the QMe14S data set, comprising 186,102 small organic molecules featuring 14 elements (H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, and Br) and 47 functional groups. Using density functional theory at the B3LYP/TZVP level, we optimized the geometries and calculated properties, including energy, atomic charge, atomic force, dipole moment, quadrupole moment, polarizability, octupole moment, first hyperpolarizability, and Hessian. At the same level, we obtained the harmonic IR, Raman, and NMR spectra. Furthermore, we conducted ab initio molecular dynamics simulations to generate dynamic configurations and extract nonequilibrium properties, including energy, forces, and Hessians. By leveraging our E(3)-equivariant message-passing neural network (DetaNet), we demonstrated that models trained on QMe14S outperform those trained on the previously developed QM9S data set in simulating molecular spectra. The QMe14S data set thus serves as a comprehensive benchmark for molecular simulations, offering valuable insights into structure–property relationships.
期刊介绍:
The Journal of Physical Chemistry (JPC) Letters is devoted to reporting new and original experimental and theoretical basic research of interest to physical chemists, biophysical chemists, chemical physicists, physicists, material scientists, and engineers. An important criterion for acceptance is that the paper reports a significant scientific advance and/or physical insight such that rapid publication is essential. Two issues of JPC Letters are published each month.