Pub Date : 2025-01-02DOI: 10.1021/acs.jctc.4c01429
Paige E Bowling, Dustin R Broderick, John M Herbert
Fragment-based quantum chemistry methods offer a means to sidestep the steep nonlinear scaling of electronic structure calculations so that large molecular systems can be investigated using high-level methods. Here, we use fragmentation to compute protein-ligand interaction energies in systems with several thousand atoms, using a new software platform for managing fragment-based calculations that implements a screened many-body expansion. Convergence tests using a minimal-basis semiempirical method (HF-3c) indicate that two-body calculations, with single-residue fragments and simple hydrogen caps, are sufficient to reproduce interaction energies obtained using conventional supramolecular electronic structure calculations, to within 1 kcal/mol at about 1% of the computational cost. We also demonstrate that the HF-3c results are illustrative of trends obtained with density functional theory in basis sets up to augmented quadruple-ζ quality. Strategic deployment of fragmentation facilitates the use of converged biomolecular model systems alongside high-quality electronic structure methods and basis sets, bringing ab initio quantum chemistry to systems of hitherto unimaginable size. This will be useful for generation of high-quality training data for machine learning applications.
{"title":"Convergent Protocols for Computing Protein-Ligand Interaction Energies Using Fragment-Based Quantum Chemistry.","authors":"Paige E Bowling, Dustin R Broderick, John M Herbert","doi":"10.1021/acs.jctc.4c01429","DOIUrl":"https://doi.org/10.1021/acs.jctc.4c01429","url":null,"abstract":"<p><p>Fragment-based quantum chemistry methods offer a means to sidestep the steep nonlinear scaling of electronic structure calculations so that large molecular systems can be investigated using high-level methods. Here, we use fragmentation to compute protein-ligand interaction energies in systems with several thousand atoms, using a new software platform for managing fragment-based calculations that implements a screened many-body expansion. Convergence tests using a minimal-basis semiempirical method (HF-3c) indicate that two-body calculations, with single-residue fragments and simple hydrogen caps, are sufficient to reproduce interaction energies obtained using conventional supramolecular electronic structure calculations, to within 1 kcal/mol at about 1% of the computational cost. We also demonstrate that the HF-3c results are illustrative of trends obtained with density functional theory in basis sets up to augmented quadruple-ζ quality. Strategic deployment of fragmentation facilitates the use of converged biomolecular model systems alongside high-quality electronic structure methods and basis sets, bringing <i>ab initio</i> quantum chemistry to systems of hitherto unimaginable size. This will be useful for generation of high-quality training data for machine learning applications.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-26DOI: 10.1021/acs.jctc.4c01237
Chunhui Xie,Rui Li,Yunqi Li,Haibo Xie,Qibin Liu
Missing data in tabular data sets is ubiquitous in statistical analysis, big data analysis, and machine learning studies. Many strategies have been proposed to impute missing data, but their reliability has not been stringently assessed in materials science. Here, we carried out a benchmark test for six imputation strategies: Mean, MissForest, HyperImpute, Gain, Sinkhorn, and a newly proposed MatImpute on seven representative data sets in materials science. The imputation-induced errors (IIEs) were evaluated through the difference between imputed and original values, by root mean square error (RMSE), Wasserstein distance (WD), and a newly introduced metrics data set correlation convergence (DCC), to measure the difference at three aspects for individual data, column-wise distribution, and correlation stability of a data set. MatImpute outperformed the others with the least RMSE and WD and the highest DCC. The IIE increases with the increase of data missing ratio and in the order of missing at random < missing completely at random ≤ missing not at random, considering inherent correlations among missing data. A similar trend was observed for the increase of IIE along the central departure distance in units of the standard deviation, which is consistent with the increase of difficulty from interpolation to extrapolation. Further tests of IIE in regression and classification machine learning predictive models, MatImpute also preserved the highest data recovery fidelity. We released the code of MatImpute to facilitate the construction of high-quality data sets in materials science.
{"title":"Imputation of Missing Data in Materials Science through Nearest Neighbors and Iterative Predictions.","authors":"Chunhui Xie,Rui Li,Yunqi Li,Haibo Xie,Qibin Liu","doi":"10.1021/acs.jctc.4c01237","DOIUrl":"https://doi.org/10.1021/acs.jctc.4c01237","url":null,"abstract":"Missing data in tabular data sets is ubiquitous in statistical analysis, big data analysis, and machine learning studies. Many strategies have been proposed to impute missing data, but their reliability has not been stringently assessed in materials science. Here, we carried out a benchmark test for six imputation strategies: Mean, MissForest, HyperImpute, Gain, Sinkhorn, and a newly proposed MatImpute on seven representative data sets in materials science. The imputation-induced errors (IIEs) were evaluated through the difference between imputed and original values, by root mean square error (RMSE), Wasserstein distance (WD), and a newly introduced metrics data set correlation convergence (DCC), to measure the difference at three aspects for individual data, column-wise distribution, and correlation stability of a data set. MatImpute outperformed the others with the least RMSE and WD and the highest DCC. The IIE increases with the increase of data missing ratio and in the order of missing at random < missing completely at random ≤ missing not at random, considering inherent correlations among missing data. A similar trend was observed for the increase of IIE along the central departure distance in units of the standard deviation, which is consistent with the increase of difficulty from interpolation to extrapolation. Further tests of IIE in regression and classification machine learning predictive models, MatImpute also preserved the highest data recovery fidelity. We released the code of MatImpute to facilitate the construction of high-quality data sets in materials science.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"23 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1021/acs.jctc.4c01361
Xiaohui Wang,Zuo-Yuan Zhang,Xiao He,Zhirong Liu,Zhaoxi Sun
Accurate modeling of host-guest systems is challenging in modern computational chemistry. It requires intermolecular interaction patterns to be correctly described and, more importantly, the dynamic behaviors of macrocyclic hosts to be accurately modeled. Pillar[n]arenes as a crucial family of macrocycles play a critical role in host-guest chemistry and biomedical applications. The carboxylated form with 6 or 7 repeating units is of high popularity due to increased solubility and the compatibility between cavity size and drugs. While prefitted transferable force fields are dominantly applied in host-guest modeling, their reliability and accuracy for macrocyclic hosts remain unjustified. In the current work, based on solid numerical evidence about energetics and dynamics, we prove that all transferable force fields fail to provide a correct description of host dynamics for the most popular carboxylated pillararenes. Therefore, all existing simulation reports on this host family could be biased due to the unsuitability of the force-field description. Such huge modeling problems do not occur in other host families that are relatively rigid (e.g., octa acids and cucurbiturils), highlighting the difficulties in modeling pillararene host-guest interactions. To pursue the true picture of the pillararene dynamics and host-guest binding, we fit high-quality molecule-specific parameters for the carboxylated pillararene based on ab initio calculations and perform an exhaustive conformational search of host-guest binding modes with advanced sampling techniques. We provide estimates of binding thermodynamics, report the true dynamic behavior of the WP6 host in the bound and unbound states, and reveal a general multimodal binding behavior of pillararene host-guest complexes. The current work serves as a critical step toward a reliable all-atom description of pillararene host-guest coordination.
{"title":"True Dynamics of Pillararene Host-Guest Binding.","authors":"Xiaohui Wang,Zuo-Yuan Zhang,Xiao He,Zhirong Liu,Zhaoxi Sun","doi":"10.1021/acs.jctc.4c01361","DOIUrl":"https://doi.org/10.1021/acs.jctc.4c01361","url":null,"abstract":"Accurate modeling of host-guest systems is challenging in modern computational chemistry. It requires intermolecular interaction patterns to be correctly described and, more importantly, the dynamic behaviors of macrocyclic hosts to be accurately modeled. Pillar[n]arenes as a crucial family of macrocycles play a critical role in host-guest chemistry and biomedical applications. The carboxylated form with 6 or 7 repeating units is of high popularity due to increased solubility and the compatibility between cavity size and drugs. While prefitted transferable force fields are dominantly applied in host-guest modeling, their reliability and accuracy for macrocyclic hosts remain unjustified. In the current work, based on solid numerical evidence about energetics and dynamics, we prove that all transferable force fields fail to provide a correct description of host dynamics for the most popular carboxylated pillararenes. Therefore, all existing simulation reports on this host family could be biased due to the unsuitability of the force-field description. Such huge modeling problems do not occur in other host families that are relatively rigid (e.g., octa acids and cucurbiturils), highlighting the difficulties in modeling pillararene host-guest interactions. To pursue the true picture of the pillararene dynamics and host-guest binding, we fit high-quality molecule-specific parameters for the carboxylated pillararene based on ab initio calculations and perform an exhaustive conformational search of host-guest binding modes with advanced sampling techniques. We provide estimates of binding thermodynamics, report the true dynamic behavior of the WP6 host in the bound and unbound states, and reveal a general multimodal binding behavior of pillararene host-guest complexes. The current work serves as a critical step toward a reliable all-atom description of pillararene host-guest coordination.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"63 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24Epub Date: 2024-12-07DOI: 10.1021/acs.jctc.4c00938
William Dawson, Katsuhisa Ozaki, Jens Domke, Takahito Nakajima
The abundant demand for deep learning compute resources has created a renaissance in low-precision hardware. Going forward, it will be essential for simulation software to run on this new generation of machines without sacrificing scientific fidelity. In this paper, we examine the precision requirements of a representative kernel from quantum chemistry calculations: the calculation of the single-particle density matrix from a given mean-field Hamiltonian (i.e., Hartree-Fock or density functional theory) represented in an LCAO basis. We find that double precision affords an unnecessarily high level of precision, leading to optimization opportunities. We show how an approximation built from an error-free matrix multiplication transformation can be used to potentially accelerate this kernel on future hardware. Our results provide a roadmap for adapting quantum chemistry software for the next generation of high-performance computing platforms.
{"title":"Reducing Numerical Precision Requirements in Quantum Chemistry Calculations.","authors":"William Dawson, Katsuhisa Ozaki, Jens Domke, Takahito Nakajima","doi":"10.1021/acs.jctc.4c00938","DOIUrl":"10.1021/acs.jctc.4c00938","url":null,"abstract":"<p><p>The abundant demand for deep learning compute resources has created a renaissance in low-precision hardware. Going forward, it will be essential for simulation software to run on this new generation of machines without sacrificing scientific fidelity. In this paper, we examine the precision requirements of a representative kernel from quantum chemistry calculations: the calculation of the single-particle density matrix from a given mean-field Hamiltonian (i.e., Hartree-Fock or density functional theory) represented in an LCAO basis. We find that double precision affords an unnecessarily high level of precision, leading to optimization opportunities. We show how an approximation built from an error-free matrix multiplication transformation can be used to potentially accelerate this kernel on future hardware. Our results provide a roadmap for adapting quantum chemistry software for the next generation of high-performance computing platforms.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"10826-10837"},"PeriodicalIF":5.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142790509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24Epub Date: 2024-12-13DOI: 10.1021/acs.jctc.4c01283
Kaifang Huang, Lili Duan, John Z H Zhang
Accurate calculation of solvation energies has long fascinated researchers, but complex interactions within bulk water molecules pose significant challenges. Currently, molecular solvation energy calculations are mostly based on implicit solvent approximations in which the solvent molecules are treated as continuum dielectric media. However, the implicit solvent approach is not ideal because it lacks certain real solvation effects, such as that of the first solvation shell, etc. Here, we propose an explicit solvent approach, interaction-reorganization solvation (IRS) method, for molecular solvation energy calculations. The IRS approach achieves predictive accuracy comparable to that of the widely recognized solvation model based on the density (SMD) method and is significantly more accurate than that of the Poisson-Boltzmann/generalized Born surface area (PB/GBSA) methods. This is demonstrated in both the correlation coefficient and the mean absolute error (MAE) with respect to the experimental data. The IRS method is based on molecular dynamics simulation in explicit solvent and does not need to solve Poisson-Boltzmann or Schrödinger equations. On the other hand, the accuracy of the IRS method does depend on the accuracy of the molecular force field used in MD simulations. We expect that the IRS method will be very useful for the solvation energy calculations of molecules.
{"title":"From Implicit to Explicit: An Interaction-Reorganization Approach to Molecular Solvation Energy.","authors":"Kaifang Huang, Lili Duan, John Z H Zhang","doi":"10.1021/acs.jctc.4c01283","DOIUrl":"10.1021/acs.jctc.4c01283","url":null,"abstract":"<p><p>Accurate calculation of solvation energies has long fascinated researchers, but complex interactions within bulk water molecules pose significant challenges. Currently, molecular solvation energy calculations are mostly based on implicit solvent approximations in which the solvent molecules are treated as continuum dielectric media. However, the implicit solvent approach is not ideal because it lacks certain real solvation effects, such as that of the first solvation shell, etc. Here, we propose an explicit solvent approach, interaction-reorganization solvation (IRS) method, for molecular solvation energy calculations. The IRS approach achieves predictive accuracy comparable to that of the widely recognized solvation model based on the density (SMD) method and is significantly more accurate than that of the Poisson-Boltzmann/generalized Born surface area (PB/GBSA) methods. This is demonstrated in both the correlation coefficient and the mean absolute error (MAE) with respect to the experimental data. The IRS method is based on molecular dynamics simulation in explicit solvent and does not need to solve Poisson-Boltzmann or Schrödinger equations. On the other hand, the accuracy of the IRS method does depend on the accuracy of the molecular force field used in MD simulations. We expect that the IRS method will be very useful for the solvation energy calculations of molecules.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"10961-10971"},"PeriodicalIF":5.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11674157/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142816758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24Epub Date: 2024-12-09DOI: 10.1021/acs.jctc.4c01267
Henry K Tran, Leah P Weisburn, Minsik Cho, Shaun Weatherly, Hong-Zhou Ye, Troy Van Voorhis
Quantum embedding methods are powerful tools to exploit the locality of electron correlation, but thus far many wave function-in-wave function methods have focused on small (e.g., minimal) basis sets. One major challenge for extended basis sets lies in defining consistent atom- or fragment-localized orbitals in spite of the larger spatial extent of the underlying atomic orbitals. In this work, we modify a particular form of quantum embedding, bootstrap embedding (BE), to the case of extended basis sets. We find that using intrinsic atomic orbital (IAO) localization schemes alongside BE converges to ∼99.7% of the CCSD correlation energy in 3-21G, 6-311G, and cc-pVDZ basis sets for reasonably sized fragments. These results mark an important first step in extending the success of embedding methods to properly studying dynamic correlation.
{"title":"Bootstrap Embedding for Molecules in Extended Basis Sets.","authors":"Henry K Tran, Leah P Weisburn, Minsik Cho, Shaun Weatherly, Hong-Zhou Ye, Troy Van Voorhis","doi":"10.1021/acs.jctc.4c01267","DOIUrl":"10.1021/acs.jctc.4c01267","url":null,"abstract":"<p><p>Quantum embedding methods are powerful tools to exploit the locality of electron correlation, but thus far many wave function-in-wave function methods have focused on small (e.g., minimal) basis sets. One major challenge for extended basis sets lies in defining consistent atom- or fragment-localized orbitals in spite of the larger spatial extent of the underlying atomic orbitals. In this work, we modify a particular form of quantum embedding, bootstrap embedding (BE), to the case of extended basis sets. We find that using intrinsic atomic orbital (IAO) localization schemes alongside BE converges to ∼99.7% of the CCSD correlation energy in 3-21G, 6-311G, and cc-pVDZ basis sets for reasonably sized fragments. These results mark an important first step in extending the success of embedding methods to properly studying dynamic correlation.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"10912-10921"},"PeriodicalIF":5.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24Epub Date: 2024-12-11DOI: 10.1021/acs.jctc.4c01253
Victor Wen-Zhe Yu, Yu Jin, Giulia Galli, Marco Govoni
We present a massively parallel GPU-accelerated implementation of the Bethe-Salpeter equation (BSE) for the calculation of the vertical excitation energies (VEEs) and optical absorption spectra of condensed and molecular systems, starting from single-particle eigenvalues and eigenvectors obtained with density functional theory. The algorithms adopted here circumvent the slowly converging sums over empty and occupied states and the inversion of large dielectric matrices through a density matrix perturbation theory approach and a low-rank decomposition of the screened Coulomb interaction, respectively. Further computational savings are achieved by exploiting the nearsightedness of the density matrix of semiconductors and insulators to reduce the number of screened Coulomb integrals. We scale our calculations to thousands of GPUs with a hierarchical loop and data distribution strategy. The efficacy of our method is demonstrated by computing the VEEs of several spin defects in wide-band-gap materials, showing that supercells with up to 1000 atoms are necessary to obtain converged results. We discuss the validity of the common approximation that solves the BSE with truncated sums over empty and occupied states. We then apply our GW-BSE implementation to a diamond lattice with 1727 atoms to study the symmetry breaking of triplet states caused by the interaction of a point defect with an extended line defect.
{"title":"GPU-Accelerated Solution of the Bethe-Salpeter Equation for Large and Heterogeneous Systems.","authors":"Victor Wen-Zhe Yu, Yu Jin, Giulia Galli, Marco Govoni","doi":"10.1021/acs.jctc.4c01253","DOIUrl":"10.1021/acs.jctc.4c01253","url":null,"abstract":"<p><p>We present a massively parallel GPU-accelerated implementation of the Bethe-Salpeter equation (BSE) for the calculation of the vertical excitation energies (VEEs) and optical absorption spectra of condensed and molecular systems, starting from single-particle eigenvalues and eigenvectors obtained with density functional theory. The algorithms adopted here circumvent the slowly converging sums over empty and occupied states and the inversion of large dielectric matrices through a density matrix perturbation theory approach and a low-rank decomposition of the screened Coulomb interaction, respectively. Further computational savings are achieved by exploiting the nearsightedness of the density matrix of semiconductors and insulators to reduce the number of screened Coulomb integrals. We scale our calculations to thousands of GPUs with a hierarchical loop and data distribution strategy. The efficacy of our method is demonstrated by computing the VEEs of several spin defects in wide-band-gap materials, showing that supercells with up to 1000 atoms are necessary to obtain converged results. We discuss the validity of the common approximation that solves the BSE with truncated sums over empty and occupied states. We then apply our GW-BSE implementation to a diamond lattice with 1727 atoms to study the symmetry breaking of triplet states caused by the interaction of a point defect with an extended line defect.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"10899-10911"},"PeriodicalIF":5.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142805530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enhanced sampling simulations make the computational study of rare events feasible. A large family of such methods crucially depends on the definition of some collective variables (CVs) that could provide a low-dimensional representation of the relevant physics of the process. Recently, many methods have been proposed to semiautomatize the CV design by using machine learning tools to learn the variables directly from the simulation data. However, most methods are based on feedforward neural networks and require some user-defined physical descriptors. Here, we propose bypassing this step using a graph neural network to directly use the atomic coordinates as input for the CV model. This way, we achieve a fully automatic approach to CV determination that provides variables invariant under the relevant symmetries, especially the permutational one. Furthermore, we provide different analysis tools to favor the physical interpretation of the final CV. We prove the robustness of our approach using different methods from the literature for the optimization of the CV, and we prove its efficacy on several systems, including a small peptide, an ion dissociation in explicit solvent, and a simple chemical reaction.
{"title":"Descriptor-Free Collective Variables from Geometric Graph Neural Networks.","authors":"Jintu Zhang, Luigi Bonati, Enrico Trizio, Odin Zhang, Yu Kang, TingJun Hou, Michele Parrinello","doi":"10.1021/acs.jctc.4c01197","DOIUrl":"10.1021/acs.jctc.4c01197","url":null,"abstract":"<p><p>Enhanced sampling simulations make the computational study of rare events feasible. A large family of such methods crucially depends on the definition of some collective variables (CVs) that could provide a low-dimensional representation of the relevant physics of the process. Recently, many methods have been proposed to semiautomatize the CV design by using machine learning tools to learn the variables directly from the simulation data. However, most methods are based on feedforward neural networks and require some user-defined physical descriptors. Here, we propose bypassing this step using a graph neural network to directly use the atomic coordinates as input for the CV model. This way, we achieve a fully automatic approach to CV determination that provides variables invariant under the relevant symmetries, especially the permutational one. Furthermore, we provide different analysis tools to favor the physical interpretation of the final CV. We prove the robustness of our approach using different methods from the literature for the optimization of the CV, and we prove its efficacy on several systems, including a small peptide, an ion dissociation in explicit solvent, and a simple chemical reaction.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"10787-10797"},"PeriodicalIF":5.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142811463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24Epub Date: 2024-12-12DOI: 10.1021/acs.jctc.4c00990
José Guadalupe Rosas Jiménez, Balázs Fábián, Gerhard Hummer
The need for short time steps currently limits routine atomistic molecular dynamics (MD) simulations to the microsecond time scale. For long time steps, the numerical integration of the equations of motion becomes unstable, resulting in catastrophic crashes. Here, we combine mass repartitioning and rescaling to construct a water model that increases the sampling efficiency in biomolecular simulations without compromising integration stability and with preserved structural and thermodynamic properties. The resulting "fast water" is then used with a time step as before in combination with standard force fields. The reduced water viscosity and faster diffusion result in proportionally faster sampling of the larger-scale motions in the conformation space of both solute and solvent. We illustrate this approach by developing TIP3P-F based on the popular TIP3P model of water. A roughly 2-fold boost in the sampling efficiency at minimal cost in accuracy is substantial and helps lower the energy impact of large-scale MD simulations. The approach is general and can readily be applied to other water models and different types of solvents.
{"title":"Faster Sampling in Molecular Dynamics Simulations with TIP3P-F Water.","authors":"José Guadalupe Rosas Jiménez, Balázs Fábián, Gerhard Hummer","doi":"10.1021/acs.jctc.4c00990","DOIUrl":"10.1021/acs.jctc.4c00990","url":null,"abstract":"<p><p>The need for short time steps currently limits routine atomistic molecular dynamics (MD) simulations to the microsecond time scale. For long time steps, the numerical integration of the equations of motion becomes unstable, resulting in catastrophic crashes. Here, we combine mass repartitioning and rescaling to construct a water model that increases the sampling efficiency in biomolecular simulations without compromising integration stability and with preserved structural and thermodynamic properties. The resulting \"fast water\" is then used with a time step as before in combination with standard force fields. The reduced water viscosity and faster diffusion result in proportionally faster sampling of the larger-scale motions in the conformation space of both solute and solvent. We illustrate this approach by developing TIP3P-F based on the popular TIP3P model of water. A roughly 2-fold boost in the sampling efficiency at minimal cost in accuracy is substantial and helps lower the energy impact of large-scale MD simulations. The approach is general and can readily be applied to other water models and different types of solvents.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"11068-11081"},"PeriodicalIF":5.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11672673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142816709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24Epub Date: 2024-12-13DOI: 10.1021/acs.jctc.4c01240
Nathaniel L Kitzmiller, Mitchell E Lahm, Laura N Olive Dornshuld, Jincan Jin, Wesley D Allen, Henry F Schaefer Iii
The concordant mode approach (CMA) is a promising new scheme for dramatically increasing the system size and level of theory achievable in quantum chemical computations of molecular vibrational frequencies. Here, we achieve advances in the CMA hierarchy by computations targeting CCSD(T)/cc-pVTZ (coupled cluster singles and doubles with perturbative triples using a correlation-consistent polarized-valence triple-ζ basis set) benchmarks within the G2 molecular test set, executing a statistical analysis for 1501 frequencies from 111 compounds and then separately solving the refractory case of pyridine. First, MP2/cc-pVTZ (second-order Møller-Plesset perturbation theory with the same basis set) proves to be an excellent and preferred choice for generating the underlying (Level B) normal modes of the CMA scheme. Utilizing this Level B within the CMA-0A method reproduces the 1501 benchmark frequencies with a mean absolute error (MAE) of only 0.11 cm-1 and an attendant standard deviation of 0.49 cm-1. Second, a convergent CMA-2 method is constituted that allows efficient computation of higher level (Level A) frequencies to any reasonable accuracy threshold by using only Hartree-Fock (HF) and MP2 or density functional theory (DFT) data to generate ξ parameters, which select the sparse off-diagonal force field elements for explicit evaluation at Level A. When Level B = MP2/cc-pVTZ, a cutoff of ξ = 0.02 provides an average maximum absolute error per molecule of only 0.17 cm-1 by incurring merely a 33% increase in average cost over CMA-0A. This CMA-2 method also eradicates the 4 problematic CMA-0A outliers of pyridine with even less effort (ξ = 0.04, 22% increase). Finally, the newly developed CMA procedures are shown to be highly successful when applied to 1-(1H-pyrrol-3-yl)ethanol, a new test molecule with diverse types of vibration.
{"title":"Convergent Concordant Mode Approach for Molecular Vibrations: CMA-2.","authors":"Nathaniel L Kitzmiller, Mitchell E Lahm, Laura N Olive Dornshuld, Jincan Jin, Wesley D Allen, Henry F Schaefer Iii","doi":"10.1021/acs.jctc.4c01240","DOIUrl":"10.1021/acs.jctc.4c01240","url":null,"abstract":"<p><p>The concordant mode approach (CMA) is a promising new scheme for dramatically increasing the system size and level of theory achievable in quantum chemical computations of molecular vibrational frequencies. Here, we achieve advances in the CMA hierarchy by computations targeting CCSD(T)/cc-pVTZ (coupled cluster singles and doubles with perturbative triples using a correlation-consistent polarized-valence triple-ζ basis set) benchmarks within the G2 molecular test set, executing a statistical analysis for 1501 frequencies from 111 compounds and then separately solving the refractory case of pyridine. First, MP2/cc-pVTZ (second-order Møller-Plesset perturbation theory with the same basis set) proves to be an excellent and preferred choice for generating the underlying (Level B) normal modes of the CMA scheme. Utilizing this Level B within the CMA-0A method reproduces the 1501 benchmark frequencies with a mean absolute error (MAE) of only 0.11 cm<sup>-1</sup> and an attendant standard deviation of 0.49 cm<sup>-1</sup>. Second, a convergent CMA-2 method is constituted that allows efficient computation of higher level (Level A) frequencies to any reasonable accuracy threshold by using only Hartree-Fock (HF) and MP2 or density functional theory (DFT) data to generate ξ parameters, which select the sparse off-diagonal force field elements for explicit evaluation at Level A. When Level B = MP2/cc-pVTZ, a cutoff of ξ = 0.02 provides an average maximum absolute error per molecule of only 0.17 cm<sup>-1</sup> by incurring merely a 33% increase in average cost over CMA-0A. This CMA-2 method also eradicates the 4 problematic CMA-0A outliers of pyridine with even less effort (ξ = 0.04, 22% increase). Finally, the newly developed CMA procedures are shown to be highly successful when applied to 1-(1<i>H</i>-pyrrol-3-yl)ethanol, a new test molecule with diverse types of vibration.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"10886-10898"},"PeriodicalIF":5.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11673116/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}