The study explores how well machine learning and structural fingerprints can predict spectroscopic properties of ice (OH vibrational frequencies and 1H chemical shifts). A large theoretical data set (55 ice polymorphs, 1010 DFT data points both for the vibrations and for the NMR shifts) and a smaller cross-validation set are employed. The Message Passing Atomic Cluster Expansion (MACE) model performs the best, with high accuracy (root-mean-square deviation, RMSD, of 0.06 ppm for chemical shifts and ∼10 cm–1 for vibrational frequencies). Simpler descriptors like ACSF and SOAP, when paired with suitable regressors, nearly match MACE’s performance. At the other end of the complexity scale, it is found that using the simplest possible physics-based descriptor of the environment (a single H-bond distance) yields RMSD values three times as large for the vibrations and four times as large for the proton chemical shift compared to the MACE model. Depending on the context, those RMSD values may still be considered modest and useful, considering the gain in simplicity and transparency.
{"title":"Machine-Learning Ice Spectra: From 1 to 256 Features","authors":"Shokirbek Shermukhamedov,Jolla Kullgren,Daniel Sethio,Kersti Hermansson","doi":"10.1021/acs.jctc.5c01413","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01413","url":null,"abstract":"The study explores how well machine learning and structural fingerprints can predict spectroscopic properties of ice (OH vibrational frequencies and 1H chemical shifts). A large theoretical data set (55 ice polymorphs, 1010 DFT data points both for the vibrations and for the NMR shifts) and a smaller cross-validation set are employed. The Message Passing Atomic Cluster Expansion (MACE) model performs the best, with high accuracy (root-mean-square deviation, RMSD, of 0.06 ppm for chemical shifts and ∼10 cm–1 for vibrational frequencies). Simpler descriptors like ACSF and SOAP, when paired with suitable regressors, nearly match MACE’s performance. At the other end of the complexity scale, it is found that using the simplest possible physics-based descriptor of the environment (a single H-bond distance) yields RMSD values three times as large for the vibrations and four times as large for the proton chemical shift compared to the MACE model. Depending on the context, those RMSD values may still be considered modest and useful, considering the gain in simplicity and transparency.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"17 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146111079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1021/acs.jctc.5c01892
Daniel F. Calero-Osorio,Paul W. Ayers
We show how to add the effects of residual electron correlation to a reference seniority-zero wave function by transforming the true electronic Hamiltonian into seniority-zero form. The transformation is treated via the Baker–Campbell–Hausdorff (BCH) expansion, and the seniority-zero structure of the reference is exploited to evaluate the first three commutators exactly; the remaining contributions are handled with a recursive commutator approximation, as is typical in canonical transformation methods. By choosing a seniority-zero reference and using parallel computation, this method is practical for small- to medium-sized systems. Numerical tests show high accuracy, with errors ∼10–4 Hartree.
{"title":"Seniority-Zero Canonical Transformation Theory: Error Reduction via Late Truncation","authors":"Daniel F. Calero-Osorio,Paul W. Ayers","doi":"10.1021/acs.jctc.5c01892","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01892","url":null,"abstract":"We show how to add the effects of residual electron correlation to a reference seniority-zero wave function by transforming the true electronic Hamiltonian into seniority-zero form. The transformation is treated via the Baker–Campbell–Hausdorff (BCH) expansion, and the seniority-zero structure of the reference is exploited to evaluate the first three commutators exactly; the remaining contributions are handled with a recursive commutator approximation, as is typical in canonical transformation methods. By choosing a seniority-zero reference and using parallel computation, this method is practical for small- to medium-sized systems. Numerical tests show high accuracy, with errors ∼10–4 Hartree.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"8 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146111081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1021/acs.jctc.5c01817
Arnab Bachhar,Nicholas J. Mayhall
Transition metal complexes present significant challenges for electronic structure theory due to strong electron correlation arising from partially filled d-orbitals. We compare our recently developed Tensor Product Selected Configuration Interaction (TPSCI) with Density Matrix Renormalization Group (DMRG) for computing exchange coupling constants in six transition metal systems, including dinuclear Cr, Fe, and Mn complexes and a tetranuclear Ni-cubane. TPSCI uses a locally correlated tensor product state basis to capture electronic structure efficiently while maintaining interpretability. From calculations on active spaces ranging from (22e,29o) to (42e,49o), we find that TPSCI consistently yields higher variational energies than DMRG due to truncation of local cluster states, but provides magnetic exchange coupling constants (J) generally within 10–30 cm–1 of DMRG results. Key advantages include natural multistate capability enabling direct J extrapolation with smaller statistical errors, and computational efficiency for challenging systems. However, cluster state truncation represents a fundamental limitation requiring careful convergence testing, particularly for large local cluster dimensions. We identify specific failure cases where current truncation schemes break down, highlighting the need for improved cluster state selection methods and distributed memory implementations to realize TPSCI’s full potential for strongly correlated systems.
{"title":"Computing Exchange Coupling Constants in Transition Metal Complexes with Tensor Product Selected Configuration Interaction","authors":"Arnab Bachhar,Nicholas J. Mayhall","doi":"10.1021/acs.jctc.5c01817","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01817","url":null,"abstract":"Transition metal complexes present significant challenges for electronic structure theory due to strong electron correlation arising from partially filled d-orbitals. We compare our recently developed Tensor Product Selected Configuration Interaction (TPSCI) with Density Matrix Renormalization Group (DMRG) for computing exchange coupling constants in six transition metal systems, including dinuclear Cr, Fe, and Mn complexes and a tetranuclear Ni-cubane. TPSCI uses a locally correlated tensor product state basis to capture electronic structure efficiently while maintaining interpretability. From calculations on active spaces ranging from (22e,29o) to (42e,49o), we find that TPSCI consistently yields higher variational energies than DMRG due to truncation of local cluster states, but provides magnetic exchange coupling constants (J) generally within 10–30 cm–1 of DMRG results. Key advantages include natural multistate capability enabling direct J extrapolation with smaller statistical errors, and computational efficiency for challenging systems. However, cluster state truncation represents a fundamental limitation requiring careful convergence testing, particularly for large local cluster dimensions. We identify specific failure cases where current truncation schemes break down, highlighting the need for improved cluster state selection methods and distributed memory implementations to realize TPSCI’s full potential for strongly correlated systems.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"91 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146111080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1021/acs.jctc.5c01733
D. Vale Cofer-Shabica,Jennifer R. DeRosa,Joseph E. Subotnik
Marcus theory is the workhorse of theoretical chemistry for predicting the rates of charge and energy transfer. Marcus theory overwhelmingly agrees with experiment─both in terms of electron transfer and triplet energy transfer─for the famous set of naphthalene-bridge-biphenyl and naphthalene-bridge-benzophenone systems studied by Piotrowiak, Miller, and Closs. That being said, the agreement is not perfect, and in this manuscript, we revisit one key point of disagreement: the molecule C-13-ae ([3,equatorial]-naphthalene-cyclohexane-[1,axial]-benzophenone). To better understand the theory–experiment disagreement, we introduce and employ a novel scheme to sample the seam between two diabatic electronic states (E-SHAKE) through which we reveal the breakdown of the Condon approximation and the presence of a conical intersection for the C-13-ae molecule; we also predict an isotopic effect on the rate of triplet–triplet energy transfer.
{"title":"Marcus Theory and the Condon Approximation Revisited I: E-SHAKE and Seam Sampling","authors":"D. Vale Cofer-Shabica,Jennifer R. DeRosa,Joseph E. Subotnik","doi":"10.1021/acs.jctc.5c01733","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01733","url":null,"abstract":"Marcus theory is the workhorse of theoretical chemistry for predicting the rates of charge and energy transfer. Marcus theory overwhelmingly agrees with experiment─both in terms of electron transfer and triplet energy transfer─for the famous set of naphthalene-bridge-biphenyl and naphthalene-bridge-benzophenone systems studied by Piotrowiak, Miller, and Closs. That being said, the agreement is not perfect, and in this manuscript, we revisit one key point of disagreement: the molecule C-13-ae ([3,equatorial]-naphthalene-cyclohexane-[1,axial]-benzophenone). To better understand the theory–experiment disagreement, we introduce and employ a novel scheme to sample the seam between two diabatic electronic states (E-SHAKE) through which we reveal the breakdown of the Condon approximation and the presence of a conical intersection for the C-13-ae molecule; we also predict an isotopic effect on the rate of triplet–triplet energy transfer.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"2017 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146111086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1021/acs.jctc.5c01704
Yue Yu,Francesco Calcagno,Haote Li,Victor S. Batista
We introduce a variational quantum autoencoder tailored for de novo molecular design, named QO-BRA (Quantum Operator-Based Real Amplitude autoencoder). QO-BRA leverages quantum circuits for real-amplitude encoding and the SWAP test to estimate reconstruction and latent-space regularization errors during back-propagation. Adjoint encoder and decoder operators enable unitary transformations and a generative process that ensures accurate reconstruction, as well as the novelty, uniqueness, and validity of the generated samples. We showcase the capabilities of QO-BRA as applied to the de novo design of Ca2+-, Mg2+-, and Zn2+-binding metalloproteins after training the generative model with a modest data set.
{"title":"QO-BRA: A Quantum Operator-Based Autoencoder for De Novo Molecular Design","authors":"Yue Yu,Francesco Calcagno,Haote Li,Victor S. Batista","doi":"10.1021/acs.jctc.5c01704","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01704","url":null,"abstract":"We introduce a variational quantum autoencoder tailored for de novo molecular design, named QO-BRA (Quantum Operator-Based Real Amplitude autoencoder). QO-BRA leverages quantum circuits for real-amplitude encoding and the SWAP test to estimate reconstruction and latent-space regularization errors during back-propagation. Adjoint encoder and decoder operators enable unitary transformations and a generative process that ensures accurate reconstruction, as well as the novelty, uniqueness, and validity of the generated samples. We showcase the capabilities of QO-BRA as applied to the de novo design of Ca2+-, Mg2+-, and Zn2+-binding metalloproteins after training the generative model with a modest data set.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"1 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146111084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1021/acs.jctc.5c01328
Ziying Yuan,Neil Qiang Su
Accurate and transferable exchange–correlation (XC) functionals are central to the predictive power of density functional theory (DFT). However, conventional parameter optimization of XC functionals is typically performed using single-objective or stepwise strategies, which may lead to imbalanced performance across chemically diverse systems. This work introduces a multi-objective optimization framework, termed EBI4MO (explicit-by-implicit for multi-objectives), that enables simultaneous and consistent optimization with respect to multiple performance criteria. EBI4MO constructs a hierarchy of implicit functions that couple interdependent parameter groups across objectives, allowing sequential yet interlinked parameter updates. As a demonstration, EBI4MO is applied to optimize the parameters in hybrid XC functionals with dispersion corrections, using the GMTKN55 benchmark database. Two objectives are considered: minimizing the overall prediction error and achieving uniform improvement relative to B3LYP-D3(BJ), a widely used and balanced functional. The resulting functionals demonstrate consistent and balanced performance across all benchmark subsets, outperforming functionals optimized via conventional single-objective or stepwise methods. These results highlight the effectiveness and generality of EBI4MO, offering a new strategy for functional development and broader multi-objective optimization problems in computational chemistry.
{"title":"Multi-Objective Optimization of Approximate Functionals via Implicit Interdependency Modeling","authors":"Ziying Yuan,Neil Qiang Su","doi":"10.1021/acs.jctc.5c01328","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01328","url":null,"abstract":"Accurate and transferable exchange–correlation (XC) functionals are central to the predictive power of density functional theory (DFT). However, conventional parameter optimization of XC functionals is typically performed using single-objective or stepwise strategies, which may lead to imbalanced performance across chemically diverse systems. This work introduces a multi-objective optimization framework, termed EBI4MO (explicit-by-implicit for multi-objectives), that enables simultaneous and consistent optimization with respect to multiple performance criteria. EBI4MO constructs a hierarchy of implicit functions that couple interdependent parameter groups across objectives, allowing sequential yet interlinked parameter updates. As a demonstration, EBI4MO is applied to optimize the parameters in hybrid XC functionals with dispersion corrections, using the GMTKN55 benchmark database. Two objectives are considered: minimizing the overall prediction error and achieving uniform improvement relative to B3LYP-D3(BJ), a widely used and balanced functional. The resulting functionals demonstrate consistent and balanced performance across all benchmark subsets, outperforming functionals optimized via conventional single-objective or stepwise methods. These results highlight the effectiveness and generality of EBI4MO, offering a new strategy for functional development and broader multi-objective optimization problems in computational chemistry.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"40 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146097956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1021/acs.jctc.5c01426
William Livernois,Olaiyan Alolaiyan,Arpan De,M. P. Anantram
Force fields were developed for metal-mediated DNA (mmDNA) structures, using ab initio methods to parametrize metal coordination. Two mmDNA structures were considered, using cytosine/thymine mismatches with coordinated Ag and Hg metal atoms. These base pairs were parametrized with the proposed computational framework and subjected to multiple validation steps. The generated force fields showed enhanced structural stability within the metalated base pairs, with the coordinated metal rotating into the major groove. Our findings show a higher propeller angle associated with the metalated base pair, which agrees with previously reported experimental data. The developed force fields have the potential to unveil the structural dynamics of long metalated DNA nanowires, while results have been demonstrated on a chain of three mmDNA base pairs.
{"title":"Scalable Force Fields for Metal-Mediated DNA Nanostructures","authors":"William Livernois,Olaiyan Alolaiyan,Arpan De,M. P. Anantram","doi":"10.1021/acs.jctc.5c01426","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01426","url":null,"abstract":"Force fields were developed for metal-mediated DNA (mmDNA) structures, using ab initio methods to parametrize metal coordination. Two mmDNA structures were considered, using cytosine/thymine mismatches with coordinated Ag and Hg metal atoms. These base pairs were parametrized with the proposed computational framework and subjected to multiple validation steps. The generated force fields showed enhanced structural stability within the metalated base pairs, with the coordinated metal rotating into the major groove. Our findings show a higher propeller angle associated with the metalated base pair, which agrees with previously reported experimental data. The developed force fields have the potential to unveil the structural dynamics of long metalated DNA nanowires, while results have been demonstrated on a chain of three mmDNA base pairs.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"82 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146097957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1021/acs.jctc.5c01455
Jinzhe Ma,Xiaoyan Fu,Wenbo Xie,P. Hu
Universal machine learning interatomic potentials (uMLIPs) represent a significant advancement in interatomic potential modeling, offering remarkable predictive accuracy across a wide range of chemical systems. However, their applications in catalytic reaction simulation are limited by their lack of accuracy in describing reactions, especially in reaction barrier prediction. In this study, we evaluate two established uMLIPs and use fine-tuning strategies to enhance their performance for the prediction of catalytic reaction prediction. We systematically compared the predictive accuracy, data efficiency, and generalization capabilities of two approaches, fine-tuning and training from scratch, using the accuracy of the original pretrained uMLIPs as a baseline. Specifically, we evaluated the applicability of the approaches across a range of tasks, from relatively simple applications such as molecular dynamics (MD) simulations and adsorption energy calculations to more complex challenges such as transition state searches. We also analyzed model performance across varying training set sizes to identify the critical data threshold needed for accurate reaction predictions. Additionally, we assessed the extrapolative generalization of the models by examining improvements in predictive accuracy for unseen elements following fine-tuning across both simple and complex tasks. Our results show that fine-tuning uMLIPs significantly improves the accuracy of reaction energy predictions, reducing the mean absolute error (MAE) to 0.09 eV, compared to 0.38 eV for the original uMLIPs. Notably, the fine-tuned models require only 10%–30% of the data used for training from scratch, yielding a stable and reliable performance. Moreover, the generalization capabilities of the uMLIPs were preserved after fine-tuning. This approach shows significant promise for extending the uMLIPs applicability to diverse catalytic reaction systems.
{"title":"From Pretrained to Precision: Fine-Tuning Universal Interatomic Potentials for Accurate Catalytic Reaction Simulations","authors":"Jinzhe Ma,Xiaoyan Fu,Wenbo Xie,P. Hu","doi":"10.1021/acs.jctc.5c01455","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01455","url":null,"abstract":"Universal machine learning interatomic potentials (uMLIPs) represent a significant advancement in interatomic potential modeling, offering remarkable predictive accuracy across a wide range of chemical systems. However, their applications in catalytic reaction simulation are limited by their lack of accuracy in describing reactions, especially in reaction barrier prediction. In this study, we evaluate two established uMLIPs and use fine-tuning strategies to enhance their performance for the prediction of catalytic reaction prediction. We systematically compared the predictive accuracy, data efficiency, and generalization capabilities of two approaches, fine-tuning and training from scratch, using the accuracy of the original pretrained uMLIPs as a baseline. Specifically, we evaluated the applicability of the approaches across a range of tasks, from relatively simple applications such as molecular dynamics (MD) simulations and adsorption energy calculations to more complex challenges such as transition state searches. We also analyzed model performance across varying training set sizes to identify the critical data threshold needed for accurate reaction predictions. Additionally, we assessed the extrapolative generalization of the models by examining improvements in predictive accuracy for unseen elements following fine-tuning across both simple and complex tasks. Our results show that fine-tuning uMLIPs significantly improves the accuracy of reaction energy predictions, reducing the mean absolute error (MAE) to 0.09 eV, compared to 0.38 eV for the original uMLIPs. Notably, the fine-tuned models require only 10%–30% of the data used for training from scratch, yielding a stable and reliable performance. Moreover, the generalization capabilities of the uMLIPs were preserved after fine-tuning. This approach shows significant promise for extending the uMLIPs applicability to diverse catalytic reaction systems.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"91 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146097958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1021/acs.jctc.5c01925
Yu Zhang
We present a reduced-scaling auxiliary-field quantum Monte Carlo (AFQMC) framework designed for large molecular systems and ensembles, with or without coupling to optical cavities. Our approach leverages the natural block sparsity of the Cholesky decomposition (CD) of electron repulsion integrals in molecular ensembles and employs tensor hypercontraction (THC) to efficiently compress low-rank Cholesky blocks. By representing the Cholesky vectors in a mixed format, keeping high-rank blocks in block-sparse form and compressing low-rank blocks with THC, we reduce the scaling of exchange-energy evaluation from quartic to robust cubic in the number of molecular orbitals N, while lowering memory from cubic toward quadratic. Benchmark analyses on one-, two-, and three-dimensional molecular ensembles (up to ∼1,200 orbitals) show that (a) the number of nonzeros in Cholesky tensors grows linearly with system size across dimensions; (b) the average numerical rank increases sublinearly and does not saturate at these sizes; and (c) rank heterogeneity─some blocks nearly full rank and many low rank, naturally motivates the proposed mixed block sparsity and THC scheme for efficient calculation of exchange energy. We demonstrate that the mixed scheme yields cubic wall-time scaling with favorable prefactors and preserves AFQMC accuracy.
{"title":"Scalable Quantum Monte Carlo Method for Polariton Chemistry via Mixed Block Sparsity and Tensor Hypercontraction Method","authors":"Yu Zhang","doi":"10.1021/acs.jctc.5c01925","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01925","url":null,"abstract":"We present a reduced-scaling auxiliary-field quantum Monte Carlo (AFQMC) framework designed for large molecular systems and ensembles, with or without coupling to optical cavities. Our approach leverages the natural block sparsity of the Cholesky decomposition (CD) of electron repulsion integrals in molecular ensembles and employs tensor hypercontraction (THC) to efficiently compress low-rank Cholesky blocks. By representing the Cholesky vectors in a mixed format, keeping high-rank blocks in block-sparse form and compressing low-rank blocks with THC, we reduce the scaling of exchange-energy evaluation from quartic to robust cubic in the number of molecular orbitals N, while lowering memory from cubic toward quadratic. Benchmark analyses on one-, two-, and three-dimensional molecular ensembles (up to ∼1,200 orbitals) show that (a) the number of nonzeros in Cholesky tensors grows linearly with system size across dimensions; (b) the average numerical rank increases sublinearly and does not saturate at these sizes; and (c) rank heterogeneity─some blocks nearly full rank and many low rank, naturally motivates the proposed mixed block sparsity and THC scheme for efficient calculation of exchange energy. We demonstrate that the mixed scheme yields cubic wall-time scaling with favorable prefactors and preserves AFQMC accuracy.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"82 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146097962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1021/acs.jctc.5c01767
Moritz R. Schäfer,Johannes Kästner
In this work, we present enhanced representation-based sampling (ERBS), a novel enhanced sampling method designed to generate structurally diverse training data sets for machine-learned interatomic potentials. ERBS automatically identifies collective variables by dimensionality reduction of atomic descriptors and applies a bias potential inspired by the On-the-Fly probability enhanced sampling framework. We highlight the ability of Gaussian moment descriptors to capture collective molecular motions and explore the impact of biasing parameters using alanine dipeptide as a benchmark system. We show that free energy surfaces can be reconstructed with high fidelity using only short biased trajectories as training data. Further, we apply the method to the iterative construction of a liquid water data set and compare the quality of simulated self-diffusion coefficients for models trained with molecular dynamics and ERBS data. Further, we active-learn models for liquid water with and without enhanced sampling and compare the quality of simulated self-diffusion coefficients. The self-diffusion coefficients closely match those simulated with a reference model at a significantly reduced data set size. Finally, we compare the sampling behavior of enhanced sampling methods by benchmarking the mean squared displacements of BMIM+BF4– trajectories simulated with uncertainty-driven dynamics and ERBS and find that the latter significantly increases the exploration of configurational space.
{"title":"Enhanced Representation-Based Sampling for the Efficient Generation of Data Sets for Machine-Learned Interatomic Potentials","authors":"Moritz R. Schäfer,Johannes Kästner","doi":"10.1021/acs.jctc.5c01767","DOIUrl":"https://doi.org/10.1021/acs.jctc.5c01767","url":null,"abstract":"In this work, we present enhanced representation-based sampling (ERBS), a novel enhanced sampling method designed to generate structurally diverse training data sets for machine-learned interatomic potentials. ERBS automatically identifies collective variables by dimensionality reduction of atomic descriptors and applies a bias potential inspired by the On-the-Fly probability enhanced sampling framework. We highlight the ability of Gaussian moment descriptors to capture collective molecular motions and explore the impact of biasing parameters using alanine dipeptide as a benchmark system. We show that free energy surfaces can be reconstructed with high fidelity using only short biased trajectories as training data. Further, we apply the method to the iterative construction of a liquid water data set and compare the quality of simulated self-diffusion coefficients for models trained with molecular dynamics and ERBS data. Further, we active-learn models for liquid water with and without enhanced sampling and compare the quality of simulated self-diffusion coefficients. The self-diffusion coefficients closely match those simulated with a reference model at a significantly reduced data set size. Finally, we compare the sampling behavior of enhanced sampling methods by benchmarking the mean squared displacements of BMIM+BF4– trajectories simulated with uncertainty-driven dynamics and ERBS and find that the latter significantly increases the exploration of configurational space.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"8 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146097960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}