{"title":"Beyond theory-driven discovery: introducing hot random search and datum-derived structures","authors":"Chris J. Pickard","doi":"10.1039/D4FD00134F","DOIUrl":null,"url":null,"abstract":"<p >Data-driven methods have transformed the prospects of the computational chemical sciences, with machine-learned interatomic potentials (MLIPs) speeding up calculations by several orders of magnitude. I reflect on theory-driven, as opposed to data-driven, discovery based on <em>ab initio</em> random structure searching (AIRSS), and then introduce two new methods that exploit machine-learning acceleration. I show how long high-throughput anneals, between direct structural relaxation, enabled by ephemeral data-derived potentials (EDDPs), can be incorporated into AIRSS to bias the sampling of challenging systems towards low-energy configurations. Hot AIRSS (hot-AIRSS) preserves the parallel advantage of random search, while allowing much more complex systems to be tackled. This is demonstrated through searches for complex boron structures in large unit cells. I then show how low-energy carbon structures can be directly generated from a single, experimentally determined, diamond structure. An extension to the generation of random sensible structures, candidates are stochastically generated and then optimised to minimise the difference between the EDDP environment vector and that of the reference diamond structure. The distance-based cost function is captured in an actively learned EDDP. Graphite, small nanotubes and caged, fullerene-like, structures emerge from searches using this potential, along with a rich variety of tetrahedral framework structures. Using the same approach, the pyrope, Mg<small><sub>3</sub></small>Al<small><sub>2</sub></small>(SiO<small><sub>4</sub></small>)<small><sub>3</sub></small>, garnet structure is recovered from a low-energy AIRSS structure generated in a smaller unit cell with a different chemical composition. The relationship of this approach to modern diffusion-model-based generative methods is discussed.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"256 ","pages":" 61-84"},"PeriodicalIF":3.4000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/fd/d4fd00134f?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Faraday Discussions","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/fd/d4fd00134f","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Chemistry","Score":null,"Total":0}
引用次数: 0
Abstract
Data-driven methods have transformed the prospects of the computational chemical sciences, with machine-learned interatomic potentials (MLIPs) speeding up calculations by several orders of magnitude. I reflect on theory-driven, as opposed to data-driven, discovery based on ab initio random structure searching (AIRSS), and then introduce two new methods that exploit machine-learning acceleration. I show how long high-throughput anneals, between direct structural relaxation, enabled by ephemeral data-derived potentials (EDDPs), can be incorporated into AIRSS to bias the sampling of challenging systems towards low-energy configurations. Hot AIRSS (hot-AIRSS) preserves the parallel advantage of random search, while allowing much more complex systems to be tackled. This is demonstrated through searches for complex boron structures in large unit cells. I then show how low-energy carbon structures can be directly generated from a single, experimentally determined, diamond structure. An extension to the generation of random sensible structures, candidates are stochastically generated and then optimised to minimise the difference between the EDDP environment vector and that of the reference diamond structure. The distance-based cost function is captured in an actively learned EDDP. Graphite, small nanotubes and caged, fullerene-like, structures emerge from searches using this potential, along with a rich variety of tetrahedral framework structures. Using the same approach, the pyrope, Mg3Al2(SiO4)3, garnet structure is recovered from a low-energy AIRSS structure generated in a smaller unit cell with a different chemical composition. The relationship of this approach to modern diffusion-model-based generative methods is discussed.