Carlos Acevedo-Rocha, Lukasz Berlicki, Uwe T. Bornscheuer, Dominic J. Campopiano, Pimchai Chaiyen, Janko Čivić, Zhiqi Cong, Friedrich Johannes Ehinger, Sabine Flitsch, Artur Góra, Marko Hanzevacki, Jeremy N. Harvey, Donald Hilvert, Florian Hollfelder, Amanda G. Jarvis, Bruce R. Lichtenstein, Stefan Lutz, Thomas Malcomson, E. Neil G. Marsh, Neil R. McFarlane, Alexander McKenzie, Adrian Mulholland, Sílvia Osuna, Joelle N. Pelletier, Agata Raczyńska, Gerard Roelfes, Lubomír Rulíšek, Peter Stockinger, Nicholas Turner, Francesca Valetti, Marc Van der Kamp, Mikael Widersten and Cathleen Zeymer
{"title":"Enzyme evolution, engineering and design: mechanism and dynamics: general discussion","authors":"Carlos Acevedo-Rocha, Lukasz Berlicki, Uwe T. Bornscheuer, Dominic J. Campopiano, Pimchai Chaiyen, Janko Čivić, Zhiqi Cong, Friedrich Johannes Ehinger, Sabine Flitsch, Artur Góra, Marko Hanzevacki, Jeremy N. Harvey, Donald Hilvert, Florian Hollfelder, Amanda G. Jarvis, Bruce R. Lichtenstein, Stefan Lutz, Thomas Malcomson, E. Neil G. Marsh, Neil R. McFarlane, Alexander McKenzie, Adrian Mulholland, Sílvia Osuna, Joelle N. Pelletier, Agata Raczyńska, Gerard Roelfes, Lubomír Rulíšek, Peter Stockinger, Nicholas Turner, Francesca Valetti, Marc Van der Kamp, Mikael Widersten and Cathleen Zeymer","doi":"10.1039/D4FD90022G","DOIUrl":"10.1039/D4FD90022G","url":null,"abstract":"","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"252 ","pages":" 127-156"},"PeriodicalIF":3.4,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142034579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johanna P. Carbone, Andreas Irmler, Alejandro Gallo, Tobias Schäfer, William Z. Van Benschoten, James J. Shepherd and Andreas Grüneis
We present an application of periodic coupled-cluster theory to the calculation of CO adsorption energies on the Pt(111) surface for different adsorption sites. The calculations employ a range of recently developed theoretical and computational methods. In particular, we use a recently introduced coupled-cluster ansatz, denoted as CCSD(cT), to compute correlation energies of the metallic Pt surface with and without adsorbed CO molecules. The convergence of Hartree–Fock adsorption energy contributions with respect to randomly shifted k-meshes is discussed. Recently introduced basis set incompleteness error corrections make it possible to achieve well-converged correlation energy contributions to the adsorption energies. We show that CCSD(cT) theory predicts the correct order of adsorption energies for the considered adsorption sites. Furthermore, we find that binding of the CO molecule to the top and fcc site is dominated by Hartree–Fock and correlation energy contributions, respectively.
我们介绍了周期耦合簇理论在 Pt(111) 表面不同吸附位点 CO 吸附能计算中的应用。计算采用了一系列最新开发的理论和计算方法。特别是,我们使用了最近引入的耦合簇变量(CCSD(cT))来计算金属铂表面吸附和不吸附 CO 分子时的相关能。讨论了哈特里-福克吸附能贡献与随机偏移 k 型的收敛性。最近引入的基集不完备性误差修正使吸附能的相关能贡献达到良好的收敛性成为可能。我们表明,CCSD(cT)理论预测了所考虑的吸附位点的吸附能的正确顺序。此外,我们还发现 CO 分子与顶部和 fcc 位点的结合分别由 Hartree-Fock 和相关能贡献所主导。
{"title":"CO adsorption on Pt(111) studied by periodic coupled cluster theory","authors":"Johanna P. Carbone, Andreas Irmler, Alejandro Gallo, Tobias Schäfer, William Z. Van Benschoten, James J. Shepherd and Andreas Grüneis","doi":"10.1039/D4FD00085D","DOIUrl":"10.1039/D4FD00085D","url":null,"abstract":"<p >We present an application of periodic coupled-cluster theory to the calculation of CO adsorption energies on the Pt(111) surface for different adsorption sites. The calculations employ a range of recently developed theoretical and computational methods. In particular, we use a recently introduced coupled-cluster ansatz, denoted as CCSD(cT), to compute correlation energies of the metallic Pt surface with and without adsorbed CO molecules. The convergence of Hartree–Fock adsorption energy contributions with respect to randomly shifted <em>k</em>-meshes is discussed. Recently introduced basis set incompleteness error corrections make it possible to achieve well-converged correlation energy contributions to the adsorption energies. We show that CCSD(cT) theory predicts the correct order of adsorption energies for the considered adsorption sites. Furthermore, we find that binding of the CO molecule to the top and fcc site is dominated by Hartree–Fock and correlation energy contributions, respectively.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"254 ","pages":" 586-597"},"PeriodicalIF":3.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339635/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142015641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Magdalena Abramiuk, Carlos Acevedo-Rocha, Abdulrahman Alogaidi, Fraser Armstrong, Amulyasai Bakshi, Uwe T. Bornscheuer, Dominic J. Campopiano, Pimchai Chaiyen, Friedrich Johannes Ehinger, Sabine Flitsch, Jeremy N. Harvey, Donald Hilvert, Amanda G. Jarvis, Rhiannon E. H. Jones, Bruce R. Lichtenstein, Louis Y. P. Luk, Tara C. Lurshay, Thomas Malcomson, E. Neil G. Marsh, Neil R. McFarlane, Alexander McKenzie, Clare F. Megarity, Vicent Moliner, Adrian J. Mulholland, Ben Orton, Joelle N. Pelletier, Agata Raczyńska, Per-Olof Syrén, Sean Adeoti Thompson, Nicholas Turner, Francesca Valetti, Lu Shin Wong and Cathleen Zeymer
{"title":"Biocatalytic pathways, cascades, cells and systems: general discussion","authors":"Magdalena Abramiuk, Carlos Acevedo-Rocha, Abdulrahman Alogaidi, Fraser Armstrong, Amulyasai Bakshi, Uwe T. Bornscheuer, Dominic J. Campopiano, Pimchai Chaiyen, Friedrich Johannes Ehinger, Sabine Flitsch, Jeremy N. Harvey, Donald Hilvert, Amanda G. Jarvis, Rhiannon E. H. Jones, Bruce R. Lichtenstein, Louis Y. P. Luk, Tara C. Lurshay, Thomas Malcomson, E. Neil G. Marsh, Neil R. McFarlane, Alexander McKenzie, Clare F. Megarity, Vicent Moliner, Adrian J. Mulholland, Ben Orton, Joelle N. Pelletier, Agata Raczyńska, Per-Olof Syrén, Sean Adeoti Thompson, Nicholas Turner, Francesca Valetti, Lu Shin Wong and Cathleen Zeymer","doi":"10.1039/D4FD90023E","DOIUrl":"10.1039/D4FD90023E","url":null,"abstract":"","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"252 ","pages":" 241-261"},"PeriodicalIF":3.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142015640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformer-based encoder–decoder models have demonstrated impressive results in chemical reaction prediction tasks. However, these models typically rely on pretraining using tens of millions of unlabelled molecules, which can be time-consuming and GPU-intensive. One of the central questions we aim to answer in this work is: can FlanT5 and ByT5, the encoder–decoder models pretrained solely on language data, be effectively specialised for organic reaction prediction through task-specific fine-tuning? We conduct a systematic empirical study on several key issues of the process, including tokenisation, the impact of (SMILES-oriented) pretraining, fine-tuning sample efficiency, and decoding algorithms at inference. Our key findings indicate that although being pretrained only on language tasks, FlanT5 and ByT5 provide a solid foundation to fine-tune for reaction prediction, and thus become ‘chemistry domain compatible’ in the process. This suggests that GPU-intensive and expensive pretraining on a large dataset of unlabelled molecules may be useful yet not essential, to leverage the power of language models for chemistry. All our models achieve comparable Top-1 and Top-5 accuracy although some variation across different models does exist. Notably, tokenisation and vocabulary trimming slightly affect final performance but can speed up training and inference; the most efficient greedy decoding strategy is very competitive while only marginal gains can be achieved from more sophisticated decoding algorithms. In summary, we evaluate FlanT5 and ByT5 across several dimensions and benchmark their impact on organic reaction prediction, which may guide more effective use of these state-of-the-art language models for chemistry-related tasks in the future.
{"title":"Specialising and analysing instruction-tuned and byte-level language models for organic reaction prediction†","authors":"Jiayun Pang and Ivan Vulić","doi":"10.1039/D4FD00104D","DOIUrl":"10.1039/D4FD00104D","url":null,"abstract":"<p >Transformer-based encoder–decoder models have demonstrated impressive results in chemical reaction prediction tasks. However, these models typically rely on pretraining using tens of millions of unlabelled molecules, which can be time-consuming and GPU-intensive. One of the central questions we aim to answer in this work is: can FlanT5 and ByT5, the encoder–decoder models pretrained solely on language data, be effectively specialised for organic reaction prediction through task-specific fine-tuning? We conduct a systematic empirical study on several key issues of the process, including tokenisation, the impact of (SMILES-oriented) pretraining, fine-tuning sample efficiency, and decoding algorithms at inference. Our key findings indicate that although being pretrained only on language tasks, FlanT5 and ByT5 provide a solid foundation to fine-tune for reaction prediction, and thus become ‘chemistry domain compatible’ in the process. This suggests that GPU-intensive and expensive pretraining on a large dataset of unlabelled molecules may be useful yet not essential, to leverage the power of language models for chemistry. All our models achieve comparable Top-1 and Top-5 accuracy although some variation across different models does exist. Notably, tokenisation and vocabulary trimming slightly affect final performance but can speed up training and inference; the most efficient greedy decoding strategy is very competitive while only marginal gains can be achieved from more sophisticated decoding algorithms. In summary, we evaluate FlanT5 and ByT5 across several dimensions and benchmark their impact on organic reaction prediction, which may guide more effective use of these state-of-the-art language models for chemistry-related tasks in the future.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"256 ","pages":" 413-433"},"PeriodicalIF":3.4,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/fd/d4fd00104d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gil Alexandrowicz, Dmitri Babikov, Mark Brouard, Alexander Butler, Helen Chadwick, David W. Chandler, Michal Fárník, Jan Fingerhut, Hua Guo, Tibor Győri, Christian T. Haakansson, Dan J. Harding, Dwayne Heard, Brianna R. Heazlewood, David Heathcote, Nils Hertl, Pablo G. Jambrina, Geert-Jan Kroes, Olivia A. Krohn, Paul D. Lane, Viet Le Duc, Heather J. Lewandowski, Jérôme Loreau, Max McCrea, Kenneth G. McKendrick, Jennifer Meyer, Daniel R. Moon, Amy S. Mullin, Gilbert M. Nathanson, Daniel M. Neumark, Kang-Kuen Ni, Nitish Pal, Eva Pluhařová, Christopher Reilly, Patrick Robertson, Steven J. Sibener, Chris Sparling, Vimala Sridurai, Ajeet Srivastav, Matt Strutton, Arthur G. Suits, Joshua Wagner, Peter D. Watson, Roland Wester, Stefan Willitsch, Alec. M. Wodtke and Bum Suk Zhao
{"title":"Scattering in extreme environments: general discussion","authors":"Gil Alexandrowicz, Dmitri Babikov, Mark Brouard, Alexander Butler, Helen Chadwick, David W. Chandler, Michal Fárník, Jan Fingerhut, Hua Guo, Tibor Győri, Christian T. Haakansson, Dan J. Harding, Dwayne Heard, Brianna R. Heazlewood, David Heathcote, Nils Hertl, Pablo G. Jambrina, Geert-Jan Kroes, Olivia A. Krohn, Paul D. Lane, Viet Le Duc, Heather J. Lewandowski, Jérôme Loreau, Max McCrea, Kenneth G. McKendrick, Jennifer Meyer, Daniel R. Moon, Amy S. Mullin, Gilbert M. Nathanson, Daniel M. Neumark, Kang-Kuen Ni, Nitish Pal, Eva Pluhařová, Christopher Reilly, Patrick Robertson, Steven J. Sibener, Chris Sparling, Vimala Sridurai, Ajeet Srivastav, Matt Strutton, Arthur G. Suits, Joshua Wagner, Peter D. Watson, Roland Wester, Stefan Willitsch, Alec. M. Wodtke and Bum Suk Zhao","doi":"10.1039/D4FD90018A","DOIUrl":"10.1039/D4FD90018A","url":null,"abstract":"","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"251 ","pages":" 171-204"},"PeriodicalIF":3.4,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141970078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Synthesis of predicted materials is the key and final step needed to realize a vision of computationally accelerated materials discovery. Because so many materials have been previously synthesized, one would anticipate that text-mining synthesis recipes from the literature would yield a valuable dataset to train machine-learning models that can predict synthesis recipes for new materials. Between 2016 and 2019, the corresponding author (Wenhao Sun) participated in efforts to text-mine 31 782 solid-state synthesis recipes and 35 675 solution-based synthesis recipes from the literature. Here, we characterize these datasets and show that they do not satisfy the “4 Vs” of data-science—that is: volume, variety, veracity and velocity. For this reason, we believe that machine-learned regression or classification models built from these datasets will have limited utility in guiding the predictive synthesis of novel materials. On the other hand, these large datasets provided an opportunity to identify anomalous synthesis recipes—which in fact did inspire new hypotheses on how materials form, which we later validated by experiment. Our case study here urges a re-evaluation on how to extract the most value from large historical materials-science datasets.
{"title":"A critical reflection on attempts to machine-learn materials synthesis insights from text-mined literature recipes","authors":"Wenhao Sun and Nicholas David","doi":"10.1039/D4FD00112E","DOIUrl":"10.1039/D4FD00112E","url":null,"abstract":"<p >Synthesis of predicted materials is the key and final step needed to realize a vision of computationally accelerated materials discovery. Because so many materials have been previously synthesized, one would anticipate that text-mining synthesis recipes from the literature would yield a valuable dataset to train machine-learning models that can predict synthesis recipes for new materials. Between 2016 and 2019, the corresponding author (Wenhao Sun) participated in efforts to text-mine 31 782 solid-state synthesis recipes and 35 675 solution-based synthesis recipes from the literature. Here, we characterize these datasets and show that they do not satisfy the “4 Vs” of data-science—that is: volume, variety, veracity and velocity. For this reason, we believe that machine-learned regression or classification models built from these datasets will have limited utility in guiding the predictive synthesis of novel materials. On the other hand, these large datasets provided an opportunity to identify anomalous synthesis recipes—which in fact did inspire new hypotheses on how materials form, which we later validated by experiment. Our case study here urges a re-evaluation on how to extract the most value from large historical materials-science datasets.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"256 ","pages":" 614-638"},"PeriodicalIF":3.4,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/fd/d4fd00112e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel J. Auerbach, Dmitri Babikov, Alexander Butler, David W. Chandler, Jan Fingerhut, Hua Guo, Dan J. Harding, David Heathcote, Nils Hertl, Bin Jiang, Geert-Jan Kroes, Paul D. Lane, Jérôme Loreau, Stuart R. Mackenzie, Kenneth G. McKendrick, Daniel R. Moon, Gilbert M. Nathanson, Daniel M. Neumark, Rahul Pandey, George C. Schatz, Steven J. Sibener, Ajeet Srivastav, Claire Vallance, Robert A. B. van Bree, Joshua Wagner, Gilbert C. Walker, Peter D. Watson, Stefan Willitsch, Alec M. Wodtke and Bum Suk Zhao
{"title":"Scattering at condensed-phase surfaces: general discussion","authors":"Daniel J. Auerbach, Dmitri Babikov, Alexander Butler, David W. Chandler, Jan Fingerhut, Hua Guo, Dan J. Harding, David Heathcote, Nils Hertl, Bin Jiang, Geert-Jan Kroes, Paul D. Lane, Jérôme Loreau, Stuart R. Mackenzie, Kenneth G. McKendrick, Daniel R. Moon, Gilbert M. Nathanson, Daniel M. Neumark, Rahul Pandey, George C. Schatz, Steven J. Sibener, Ajeet Srivastav, Claire Vallance, Robert A. B. van Bree, Joshua Wagner, Gilbert C. Walker, Peter D. Watson, Stefan Willitsch, Alec M. Wodtke and Bum Suk Zhao","doi":"10.1039/D4FD90020K","DOIUrl":"10.1039/D4FD90020K","url":null,"abstract":"","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"251 ","pages":" 471-508"},"PeriodicalIF":3.4,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141915508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Javier Aoiz, Nadia Balucani, Astrid Bergeat, Alexander Butler, David W. Chandler, Gábor Czakó, Tibor Győri, Dwayne E. Heard, David Heathcote, Brianna R. Heazlewood, Nils Hertl, Pablo G. Jambrina, Ralf I. Kaiser, Olivia A. Krohn, Viet Le Duc, Jérôme Loreau, Stuart R. Mackenzie, Kenneth G. McKendrick, Jennifer Meyer, Gilbert M. Nathanson, Daniel M. Neumark, Rahul Pandey, Christopher Reilly, Patrick Robertson, George C. Schatz, Steven J. Sibener, Arthur G. Suits, Peter D. Watson, Roland Wester, Stefan Willitsch, Alec M. Wodtke and Bum Suk Zhao
{"title":"Scattering of larger molecules – part 2: general discussion","authors":"F. Javier Aoiz, Nadia Balucani, Astrid Bergeat, Alexander Butler, David W. Chandler, Gábor Czakó, Tibor Győri, Dwayne E. Heard, David Heathcote, Brianna R. Heazlewood, Nils Hertl, Pablo G. Jambrina, Ralf I. Kaiser, Olivia A. Krohn, Viet Le Duc, Jérôme Loreau, Stuart R. Mackenzie, Kenneth G. McKendrick, Jennifer Meyer, Gilbert M. Nathanson, Daniel M. Neumark, Rahul Pandey, Christopher Reilly, Patrick Robertson, George C. Schatz, Steven J. Sibener, Arthur G. Suits, Peter D. Watson, Roland Wester, Stefan Willitsch, Alec M. Wodtke and Bum Suk Zhao","doi":"10.1039/D4FD90021A","DOIUrl":"10.1039/D4FD90021A","url":null,"abstract":"","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"251 ","pages":" 622-665"},"PeriodicalIF":3.4,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141905063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harveen Kaur, Flaviano Della Pia, Ilyes Batatia, Xavier R. Advincula, Benjamin X. Shi, Jinggang Lan, Gábor Csányi, Angelos Michaelides and Venkat Kapil
Calculating sublimation enthalpies of molecular crystal polymorphs is relevant to a wide range of technological applications. However, predicting these quantities at first-principles accuracy – even with the aid of machine learning potentials – is a challenge that requires sub-kJ mol−1 accuracy in the potential energy surface and finite-temperature sampling. We present an accurate and data-efficient protocol for training machine learning interatomic potentials by fine-tuning the foundational MACE-MP-0 model and showcase its capabilities on sublimation enthalpies and physical properties of ice polymorphs. Our approach requires only a few tens of training structures to achieve sub-kJ mol−1 accuracy in the sublimation enthalpies and sub-1% error in densities at finite temperature and pressure. Exploiting this data efficiency, we perform preliminary NPT simulations of hexagonal ice at the random phase approximation level and demonstrate a good agreement with experiments. Our results show promise for finite-temperature modelling of molecular crystals with the accuracy of correlated electronic structure theory methods.
计算分子晶体多晶体的升华焓与广泛的技术应用息息相关。然而,在第一原理精度下预测这些量--即使借助机器学习势能--是一项挑战,需要势能面和限温采样达到亚千焦/摩尔精度。我们通过微调基础 MACE-MP-0 模型,提出了一种精确且数据高效的机器学习原子间势能训练协议,并展示了其在冰多晶体的升华焓和物理性质方面的能力。我们的方法只需要几十个训练结构,就能在有限温度和压力下实现亚 kJ/mol 的升华焓精度和亚 1 % 的密度误差。利用这种数据效率,我们在随机相近似水平上对六角冰进行了初步的 N P T 模拟,并证明与实验结果吻合。我们的研究结果表明,分子晶体的有限温度建模有望达到相关电子结构理论方法的精度。
{"title":"Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies","authors":"Harveen Kaur, Flaviano Della Pia, Ilyes Batatia, Xavier R. Advincula, Benjamin X. Shi, Jinggang Lan, Gábor Csányi, Angelos Michaelides and Venkat Kapil","doi":"10.1039/D4FD00107A","DOIUrl":"10.1039/D4FD00107A","url":null,"abstract":"<p >Calculating sublimation enthalpies of molecular crystal polymorphs is relevant to a wide range of technological applications. However, predicting these quantities at first-principles accuracy – even with the aid of machine learning potentials – is a challenge that requires sub-kJ mol<small><sup>−1</sup></small> accuracy in the potential energy surface and finite-temperature sampling. We present an accurate and data-efficient protocol for training machine learning interatomic potentials by fine-tuning the foundational MACE-MP-0 model and showcase its capabilities on sublimation enthalpies and physical properties of ice polymorphs. Our approach requires only a few tens of training structures to achieve sub-kJ mol<small><sup>−1</sup></small> accuracy in the sublimation enthalpies and sub-1% error in densities at finite temperature and pressure. Exploiting this data efficiency, we perform preliminary <em>NPT</em> simulations of hexagonal ice at the random phase approximation level and demonstrate a good agreement with experiments. Our results show promise for finite-temperature modelling of molecular crystals with the accuracy of correlated electronic structure theory methods.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"256 ","pages":" 120-138"},"PeriodicalIF":3.4,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/fd/d4fd00107a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141945613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitri Babikov, Nadia Balucani, Astrid Bergeat, Mark Brouard, David W. Chandler, Matthew L. Costen, Michal Fárník, Hua Guo, Tibor Győri, Dwayne Heard, David Heathcote, Nils Hertl, Pablo G. Jambrina, Nathanael M. Kidwell, O. A. Krohn, Viet Le Duc, Jérôme Loreau, Stuart R. Mackenzie, Max McCrea, Kenneth G. McKendrick, Jennifer Meyer, Daniel R. Moon, Amy S. Mullin, Gilbert S. Nathanson, Daniel M. Neumark, Kang-Kuen Ni, Martin J. Paterson, Eva Pluhařová, Patrick Robertson, Christopher Reilly, George C. Schatz, Chris Sparling, Arthur G. Suits, Peter D. Watson, Roland Wester, Stefan Willitsch and Alec M. Wodtke
{"title":"Scattering of larger molecules – part 1: general discussion","authors":"Dmitri Babikov, Nadia Balucani, Astrid Bergeat, Mark Brouard, David W. Chandler, Matthew L. Costen, Michal Fárník, Hua Guo, Tibor Győri, Dwayne Heard, David Heathcote, Nils Hertl, Pablo G. Jambrina, Nathanael M. Kidwell, O. A. Krohn, Viet Le Duc, Jérôme Loreau, Stuart R. Mackenzie, Max McCrea, Kenneth G. McKendrick, Jennifer Meyer, Daniel R. Moon, Amy S. Mullin, Gilbert S. Nathanson, Daniel M. Neumark, Kang-Kuen Ni, Martin J. Paterson, Eva Pluhařová, Patrick Robertson, Christopher Reilly, George C. Schatz, Chris Sparling, Arthur G. Suits, Peter D. Watson, Roland Wester, Stefan Willitsch and Alec M. Wodtke","doi":"10.1039/D4FD90019G","DOIUrl":"10.1039/D4FD90019G","url":null,"abstract":"","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"251 ","pages":" 313-341"},"PeriodicalIF":3.4,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141892341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}