Many-body Green's function theory in the GW approximation with the Bethe–Salpeter equation (BSE) provides a powerful framework for the first-principles calculations of single-particle and electron–hole excitations in perfect crystals and molecules alike. Application to complex molecular systems, for example, solvated dyes, molecular aggregates, thin films, interfaces, or macromolecules, is particularly challenging as they contain a prohibitively large number of atoms. Exploiting the often localized nature of excitation in such disordered systems, several methods have recently been developed in which GW-BSE is applied to a smaller, tractable region of interest that is embedded into an environment described with a lower-level method. Here, we review the various strategies proposed for such embedded many-body Green's functions approaches, including quantum–quantum and quantum–classical embeddings, and focus in particular on how they include environment screening effects either intrinsically in the screened Coulomb interaction in the GW and BSE steps or via extrinsic electrostatic couplings.
{"title":"Embedded Many-Body Green's Function Methods for Electronic Excitations in Complex Molecular Systems","authors":"Gianluca Tirimbó, Vivek Sundaram, Björn Baumeier","doi":"10.1002/wcms.1734","DOIUrl":"https://doi.org/10.1002/wcms.1734","url":null,"abstract":"<p>Many-body Green's function theory in the <i>GW</i> approximation with the Bethe–Salpeter equation (BSE) provides a powerful framework for the first-principles calculations of single-particle and electron–hole excitations in perfect crystals and molecules alike. Application to complex molecular systems, for example, solvated dyes, molecular aggregates, thin films, interfaces, or macromolecules, is particularly challenging as they contain a prohibitively large number of atoms. Exploiting the often localized nature of excitation in such disordered systems, several methods have recently been developed in which <i>GW</i>-BSE is applied to a smaller, tractable region of interest that is embedded into an environment described with a lower-level method. Here, we review the various strategies proposed for such embedded many-body Green's functions approaches, including quantum–quantum and quantum–classical embeddings, and focus in particular on how they include environment screening effects either intrinsically in the screened Coulomb interaction in the <i>GW</i> and BSE steps or via extrinsic electrostatic couplings.</p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 6","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/wcms.1734","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beyond addressing technological demands, the integration of machine learning (ML) into human societies has also promoted sustainability through the adoption of digitalized protocols. Despite these advantages and the abundance of available toolkits, a substantial implementation gap is preventing the widespread incorporation of ML protocols into the computational and experimental chemistry communities. In this work, we introduce ROBERT, a software carefully crafted to make ML more accessible to chemists of all programming skill levels, while achieving results comparable to those of field experts. We conducted benchmarking using six recent ML studies in chemistry containing from 18 to 4149 entries. Furthermore, we demonstrated the program's ability to initiate workflows directly from SMILES strings, which simplifies the generation of ML predictors for common chemistry problems. To assess ROBERT's practicality in real-life scenarios, we employed it to discover new luminescent Pd complexes with a modest dataset of 23 points, a frequently encountered scenario in experimental studies.
除了满足技术需求之外,机器学习(ML)与人类社会的融合还通过采用数字化协议促进了可持续发展。尽管有这些优势和大量可用的工具包,但实施方面的巨大差距阻碍了 ML 协议在计算和实验化学界的广泛应用。在这项工作中,我们介绍了 ROBERT,这是一款精心设计的软件,旨在让所有编程技能水平的化学家都能更方便地使用 ML,同时取得与领域专家相当的结果。我们使用最近六项化学领域的 ML 研究(包含 18 到 4149 个条目)进行了基准测试。此外,我们还展示了该程序直接从 SMILES 字符串启动工作流的能力,从而简化了常见化学问题的 ML 预测器的生成。为了评估 ROBERT 在实际应用中的实用性,我们利用它发现了新的发光钯配合物,数据集只有 23 个点,这在实验研究中是经常遇到的情况。
{"title":"ROBERT: Bridging the Gap Between Machine Learning and Chemistry","authors":"David Dalmau, Juan V. Alegre-Requena","doi":"10.1002/wcms.1733","DOIUrl":"https://doi.org/10.1002/wcms.1733","url":null,"abstract":"<p>Beyond addressing technological demands, the integration of machine learning (ML) into human societies has also promoted sustainability through the adoption of digitalized protocols. Despite these advantages and the abundance of available toolkits, a substantial implementation gap is preventing the widespread incorporation of ML protocols into the computational and experimental chemistry communities. In this work, we introduce ROBERT, a software carefully crafted to make ML more accessible to chemists of all programming skill levels, while achieving results comparable to those of field experts. We conducted benchmarking using six recent ML studies in chemistry containing from 18 to 4149 entries. Furthermore, we demonstrated the program's ability to initiate workflows directly from SMILES strings, which simplifies the generation of ML predictors for common chemistry problems. To assess ROBERT's practicality in real-life scenarios, we employed it to discover new luminescent Pd complexes with a modest dataset of 23 points, a frequently encountered scenario in experimental studies.</p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 5","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/wcms.1733","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shirin Faraji, David Picconi, Elisa Palacino-González
Molecular-level understanding of photoinduced processes is critically important for breakthroughs in transformative technologies utilizing light, ranging from photomedicine to photoresponsive materials. Theory and simulation play a crucial role in this task. Despite great advances in hardware and computational methods, the theoretical description of photoinduced phenomena in the presence of complex environments and external photoexcitation conditions still poses formidable challenges for theoreticians and there are numerous formal and computational difficulties that must be overcome. The development of predictive, accurate, and at the same time, computationally efficient theoretical approaches to describe complex problems in photochemistry and photophysics is an active field of research in contemporary theoretical and computational chemistry. In this advanced review, we discuss modern computational advances and novel approaches that have been recently developed in excited-electronic structure methods, and multiscale modeling, with a special emphasis on coupled electron-nuclear dynamics and spectroscopy, from fully quantum to semi-classical methodologies—including dissipative effects, the explicit light field interaction, femtosecond time-resolved spectroscopy, and software infrastructure.
{"title":"Advanced quantum and semiclassical methods for simulating photoinduced molecular dynamics and spectroscopy","authors":"Shirin Faraji, David Picconi, Elisa Palacino-González","doi":"10.1002/wcms.1731","DOIUrl":"https://doi.org/10.1002/wcms.1731","url":null,"abstract":"<p>Molecular-level understanding of photoinduced processes is critically important for breakthroughs in transformative technologies utilizing light, ranging from photomedicine to photoresponsive materials. Theory and simulation play a crucial role in this task. Despite great advances in hardware and computational methods, the theoretical description of photoinduced phenomena in the presence of complex environments and external photoexcitation conditions still poses formidable challenges for theoreticians and there are numerous formal and computational difficulties that must be overcome. The development of predictive, accurate, and at the same time, computationally efficient theoretical approaches to describe complex problems in photochemistry and photophysics is an active field of research in contemporary theoretical and computational chemistry. In this advanced review, we discuss modern computational advances and novel approaches that have been recently developed in excited-electronic structure methods, and multiscale modeling, with a special emphasis on coupled electron-nuclear dynamics and spectroscopy, from fully quantum to semi-classical methodologies—including dissipative effects, the explicit light field interaction, femtosecond time-resolved spectroscopy, and software infrastructure.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 5","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/wcms.1731","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142429639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Energy-related materials are crucial for advancing energy technologies, improving efficiency, reducing environmental impacts, and supporting sustainable development. Designing and discovering these materials through computational techniques necessitates a comprehensive understanding of the material space, which is defined by the constituent atoms, composition, and structure. Depending on the search space involved in the investigation, the computational materials design can be categorized into four primary approaches: atomic substitution in fixed prototype structures, crystal structure prediction (CSP), variable-composition CSP, and inverse design across the entire materials space. This review provides an overview of these paradigms, detailing the concepts, strategies, and applications pertinent to energy-related materials. The progression from first-principles calculations to machine learning techniques is emphasized, with the aim of enhancing understanding and elucidating new advancements in computationally design of energy-related materials.
{"title":"Computational design of energy-related materials: From first-principles calculations to machine learning","authors":"Haibo Xue, Guanjian Cheng, Wan-Jian Yin","doi":"10.1002/wcms.1732","DOIUrl":"https://doi.org/10.1002/wcms.1732","url":null,"abstract":"<p>Energy-related materials are crucial for advancing energy technologies, improving efficiency, reducing environmental impacts, and supporting sustainable development. Designing and discovering these materials through computational techniques necessitates a comprehensive understanding of the material space, which is defined by the constituent atoms, composition, and structure. Depending on the search space involved in the investigation, the computational materials design can be categorized into four primary approaches: atomic substitution in fixed prototype structures, crystal structure prediction (CSP), variable-composition CSP, and inverse design across the entire materials space. This review provides an overview of these paradigms, detailing the concepts, strategies, and applications pertinent to energy-related materials. The progression from first-principles calculations to machine learning techniques is emphasized, with the aim of enhancing understanding and elucidating new advancements in computationally design of energy-related materials.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 5","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142429189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bokinala Moses Abraham, Mullapudi V. Jyothirmai, Priyanka Sinha, Francesc Viñes, Jayant K. Singh, Francesc Illas
The design and discovery of new and improved catalysts are driving forces for accelerating scientific and technological innovations in the fields of energy conversion, environmental remediation, and chemical industry. Recently, the use of machine learning (ML) in combination with experimental and/or theoretical data has emerged as a powerful tool for identifying optimal catalysts for various applications. This review focuses on how ML algorithms can be used in computational catalysis and materials science to gain a deeper understanding of the relationships between materials properties and their stability, activity, and selectivity. The development of scientific data repositories, data mining techniques, and ML tools that can navigate structural optimization problems are highlighted, leading to the discovery of highly efficient catalysts for a sustainable future. Several data-driven ML models commonly used in catalysis research and their diverse applications in reaction prediction are discussed. The key challenges and limitations of using ML in catalysis research are presented, which arise from the catalyst's intrinsic complex nature. Finally, we conclude by summarizing the potential future directions in the area of ML-guided catalyst development.
This article is categorized under:
设计和发现新型改良催化剂是加速能源转换、环境修复和化学工业领域科技创新的驱动力。最近,机器学习(ML)与实验和/或理论数据的结合使用已成为为各种应用确定最佳催化剂的有力工具。本综述重点介绍如何在计算催化和材料科学中使用 ML 算法,以深入了解材料特性与其稳定性、活性和选择性之间的关系。文章重点介绍了科学数据资源库、数据挖掘技术以及可解决结构优化问题的 ML 工具的发展情况,从而为可持续发展的未来发现高效催化剂。讨论了催化研究中常用的几种数据驱动的 ML 模型及其在反应预测中的各种应用。介绍了催化研究中使用 ML 所面临的主要挑战和局限性,这些挑战和局限性源于催化剂固有的复杂性。最后,我们总结了以 ML 为指导的催化剂开发领域未来的潜在发展方向:
{"title":"Catalysis in the digital age: Unlocking the power of data with machine learning","authors":"Bokinala Moses Abraham, Mullapudi V. Jyothirmai, Priyanka Sinha, Francesc Viñes, Jayant K. Singh, Francesc Illas","doi":"10.1002/wcms.1730","DOIUrl":"https://doi.org/10.1002/wcms.1730","url":null,"abstract":"<p>The design and discovery of new and improved catalysts are driving forces for accelerating scientific and technological innovations in the fields of energy conversion, environmental remediation, and chemical industry. Recently, the use of machine learning (ML) in combination with experimental and/or theoretical data has emerged as a powerful tool for identifying optimal catalysts for various applications. This review focuses on how ML algorithms can be used in computational catalysis and materials science to gain a deeper understanding of the relationships between materials properties and their stability, activity, and selectivity. The development of scientific data repositories, data mining techniques, and ML tools that can navigate structural optimization problems are highlighted, leading to the discovery of highly efficient catalysts for a sustainable future. Several data-driven ML models commonly used in catalysis research and their diverse applications in reaction prediction are discussed. The key challenges and limitations of using ML in catalysis research are presented, which arise from the catalyst's intrinsic complex nature. Finally, we conclude by summarizing the potential future directions in the area of ML-guided catalyst development.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 5","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/wcms.1730","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142273293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonardo S. G. Leite, Swarup Banerjee, Yihui Wei, Jackson Elowitt, Aurora E. Clark
Graph theory has a long history in chemistry. Yet as the breadth and variety of chemical data is rapidly changing, so too do graph encoding methods and analyses that yield qualitative and quantitative insights. Using illustrative cases within a basic mathematical framework, we showcase modern chemical graph theory's utility in Chemists' analysis and model development toolkit. The encoding of both experimental and simulation data is discussed at various levels of granularity of information. This is followed by a discussion of the two major classes of graph theoretical analyses: identifying connectivity patterns and partitioning methods. Measures, metrics, descriptors, and topological indices are then introduced with an emphasis upon enhancing interpretability and incorporation into physical models. Challenging data cases are described that include strategies for studying time dependence. Throughout, we incorporate recent advancements in computer science and applied mathematics that are propelling chemical graph theory into new domains of chemical study.
{"title":"Modern chemical graph theory","authors":"Leonardo S. G. Leite, Swarup Banerjee, Yihui Wei, Jackson Elowitt, Aurora E. Clark","doi":"10.1002/wcms.1729","DOIUrl":"https://doi.org/10.1002/wcms.1729","url":null,"abstract":"<p>Graph theory has a long history in chemistry. Yet as the breadth and variety of chemical data is rapidly changing, so too do graph encoding methods and analyses that yield qualitative and quantitative insights. Using illustrative cases within a basic mathematical framework, we showcase modern chemical graph theory's utility in Chemists' analysis and model development toolkit. The encoding of both experimental and simulation data is discussed at various levels of granularity of information. This is followed by a discussion of the two major classes of graph theoretical analyses: identifying connectivity patterns and partitioning methods. Measures, metrics, descriptors, and topological indices are then introduced with an emphasis upon enhancing interpretability and incorporation into physical models. Challenging data cases are described that include strategies for studying time dependence. Throughout, we incorporate recent advancements in computer science and applied mathematics that are propelling chemical graph theory into new domains of chemical study.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 5","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142244795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasiia S. Fedulova, Grigoriy A. Armeev, Tatiana A. Romanova, Lovepreet Singh-Palchevskaia, Nikita A. Kosarim, Nikita A. Motorin, Galina A. Komarova, Alexey K. Shaytan
Understanding the function of eukaryotic genomes, including the human genome, is undoubtedly one of the major scientific challenges of the 21st century. The cornerstone of eukaryotic genome organization is nucleosomes—elementary building blocks of chromatin about 10 nm in size that wrap DNA around an octamer of histone proteins. Nucleosomes are integral players in all genomic processes, including transcription, DNA replication and repair. They mediate genome regulation at the epigenetic level, bridging the discrete nature of the genetic information encoded in DNA with the analog physical nature of the intermolecular interactions required to access that information. Due to their relatively large size and dynamic nature, nucleosomes are difficult objects for experimental characterization. Molecular dynamics (MD) simulations have emerged over the years as a useful tool to complement experimental studies. Particularly in recent years, advances in computing power, refinement of MD force fields and codes have opened up new frontiers in terms of simulation timescales and quality for nucleosomes and related systems. It has become possible to elucidate in atomistic detail their functional dynamics modes such as DNA unwrapping and sliding, to characterize the effects of epigenetic modifications, DNA and protein sequence variation on nucleosome structure and stability, to describe the mechanisms governing nucleosome interactions with chromatin-associated proteins and the formation of supranucleosome structures. In this review, we systematically analyzed all-atom MD simulation studies of nucleosomes and related structures published since 2018 and discussed their relevance in the context of older studies, experimental data, and related coarse-grained and multiscale studies.
This article is categorized under:
了解真核生物基因组(包括人类基因组)的功能无疑是 21 世纪的重大科学挑战之一。真核生物基因组组织的基石是核小体--染色质的基本构件,大小约为 10 纳米,将 DNA 包裹在组蛋白八聚体周围。核小体是转录、DNA 复制和修复等所有基因组过程中不可或缺的角色。核小体在表观遗传水平上介导基因组调控,将 DNA 中编码的遗传信息的离散性与获取该信息所需的分子间相互作用的模拟物理性连接起来。由于核小体相对较大且具有动态性质,因此很难对其进行实验表征。多年来,分子动力学(MD)模拟已成为补充实验研究的有用工具。特别是近年来,计算能力的提高、MD 力场和代码的改进为核糖体和相关系统的模拟时间尺度和质量开辟了新的领域。我们有可能从原子细节上阐明核小体的功能动力学模式,如 DNA 的解包裹和滑动,描述表观遗传修饰、DNA 和蛋白质序列变异对核小体结构和稳定性的影响,描述核小体与染色质相关蛋白质的相互作用机制以及超核小体结构的形成。在这篇综述中,我们系统分析了2018年以来发表的核小体及相关结构的全原子MD模拟研究,并结合更早的研究、实验数据以及相关的粗粒度和多尺度研究讨论了它们的相关性。本文归类于:
{"title":"Molecular dynamics simulations of nucleosomes are coming of age","authors":"Anastasiia S. Fedulova, Grigoriy A. Armeev, Tatiana A. Romanova, Lovepreet Singh-Palchevskaia, Nikita A. Kosarim, Nikita A. Motorin, Galina A. Komarova, Alexey K. Shaytan","doi":"10.1002/wcms.1728","DOIUrl":"https://doi.org/10.1002/wcms.1728","url":null,"abstract":"<p>Understanding the function of eukaryotic genomes, including the human genome, is undoubtedly one of the major scientific challenges of the 21st century. The cornerstone of eukaryotic genome organization is nucleosomes—elementary building blocks of chromatin about 10 nm in size that wrap DNA around an octamer of histone proteins. Nucleosomes are integral players in all genomic processes, including transcription, DNA replication and repair. They mediate genome regulation at the epigenetic level, bridging the discrete nature of the genetic information encoded in DNA with the analog physical nature of the intermolecular interactions required to access that information. Due to their relatively large size and dynamic nature, nucleosomes are difficult objects for experimental characterization. Molecular dynamics (MD) simulations have emerged over the years as a useful tool to complement experimental studies. Particularly in recent years, advances in computing power, refinement of MD force fields and codes have opened up new frontiers in terms of simulation timescales and quality for nucleosomes and related systems. It has become possible to elucidate in atomistic detail their functional dynamics modes such as DNA unwrapping and sliding, to characterize the effects of epigenetic modifications, DNA and protein sequence variation on nucleosome structure and stability, to describe the mechanisms governing nucleosome interactions with chromatin-associated proteins and the formation of supranucleosome structures. In this review, we systematically analyzed all-atom MD simulation studies of nucleosomes and related structures published since 2018 and discussed their relevance in the context of older studies, experimental data, and related coarse-grained and multiscale studies.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 4","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Jiang, Lu Ke, Long Chen, Bozheng Dou, Yueying Zhu, Jie Liu, Bengong Zhang, Tianshou Zhou, Guo-Wei Wei
A transformer is the foundational architecture behind large language models designed to handle sequential data by using mechanisms of self-attention to weigh the importance of different elements, enabling efficient processing and understanding of complex patterns. Recently, transformer-based models have become some of the most popular and powerful deep learning (DL) algorithms in molecular science, owing to their distinctive architectural characteristics and proficiency in handling intricate data. These models leverage the capacity of transformer architectures to capture complex hierarchical dependencies within sequential data. As the applications of transformers in molecular science are very widespread, in this review, we only focus on the technical aspects of transformer technology in molecule domain. Specifically, we will provide an in-depth investigation into the algorithms of transformer-based machine learning techniques in molecular science. The models under consideration include generative pre-trained transformer (GPT), bidirectional and auto-regressive transformers (BART), bidirectional encoder representations from transformers (BERT), graph transformer, transformer-XL, text-to-text transfer transformer, vision transformers (ViT), detection transformer (DETR), conformer, contrastive language-image pre-training (CLIP), sparse transformers, and mobile and efficient transformers. By examining the inner workings of these models, we aim to elucidate how their architectural innovations contribute to their effectiveness in processing complex molecular data. We will also discuss promising trends in transformer models within the context of molecular science, emphasizing their technical capabilities and potential for interdisciplinary research. This review seeks to provide a comprehensive understanding of the transformer-based machine learning techniques that are driving advancements in molecular science.
{"title":"Transformer technology in molecular science","authors":"Jian Jiang, Lu Ke, Long Chen, Bozheng Dou, Yueying Zhu, Jie Liu, Bengong Zhang, Tianshou Zhou, Guo-Wei Wei","doi":"10.1002/wcms.1725","DOIUrl":"10.1002/wcms.1725","url":null,"abstract":"<p>A transformer is the foundational architecture behind large language models designed to handle sequential data by using mechanisms of self-attention to weigh the importance of different elements, enabling efficient processing and understanding of complex patterns. Recently, transformer-based models have become some of the most popular and powerful deep learning (DL) algorithms in molecular science, owing to their distinctive architectural characteristics and proficiency in handling intricate data. These models leverage the capacity of transformer architectures to capture complex hierarchical dependencies within sequential data. As the applications of transformers in molecular science are very widespread, in this review, we only focus on the technical aspects of transformer technology in molecule domain. Specifically, we will provide an in-depth investigation into the algorithms of transformer-based machine learning techniques in molecular science. The models under consideration include generative pre-trained transformer (GPT), bidirectional and auto-regressive transformers (BART), bidirectional encoder representations from transformers (BERT), graph transformer, transformer-XL, text-to-text transfer transformer, vision transformers (ViT), detection transformer (DETR), conformer, contrastive language-image pre-training (CLIP), sparse transformers, and mobile and efficient transformers. By examining the inner workings of these models, we aim to elucidate how their architectural innovations contribute to their effectiveness in processing complex molecular data. We will also discuss promising trends in transformer models within the context of molecular science, emphasizing their technical capabilities and potential for interdisciplinary research. This review seeks to provide a comprehensive understanding of the transformer-based machine learning techniques that are driving advancements in molecular science.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 4","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/wcms.1725","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michele Nottoli, Michael F. Herbst, Aleksandr Mikhalev, Abhinav Jha, Filippo Lipparini, Benjamin Stamm
Polarizable continuum solvation models are popular in both, quantum chemistry and in biophysics, though typically with different requirements for the numerical methods. However, the recent trend of multiscale modeling can be expected to blur field-specific differences. In this regard, numerical methods based on domain decomposition (dd) have been demonstrated to be sufficiently flexible to be applied all across these levels of theory while remaining systematically accurate and efficient. In this contribution, we present ddX, an open-source implementation of dd-methods for various solvation models, which features a uniform interface with classical as well as quantum descriptions of the solute, or any hybrid versions thereof. We explain the key concepts of the library design and its application program interface, and demonstrate the use of ddX for integrating into standard chemistry packages. Numerical tests illustrate the performance of ddX and its interfaces.
{"title":"ddX: Polarizable continuum solvation from small molecules to proteins","authors":"Michele Nottoli, Michael F. Herbst, Aleksandr Mikhalev, Abhinav Jha, Filippo Lipparini, Benjamin Stamm","doi":"10.1002/wcms.1726","DOIUrl":"10.1002/wcms.1726","url":null,"abstract":"<p>Polarizable continuum solvation models are popular in both, quantum chemistry and in biophysics, though typically with different requirements for the numerical methods. However, the recent trend of multiscale modeling can be expected to blur field-specific differences. In this regard, numerical methods based on domain decomposition (dd) have been demonstrated to be sufficiently flexible to be applied all across these levels of theory while remaining systematically accurate and efficient. In this contribution, we present <span>ddX</span>, an open-source implementation of dd-methods for various solvation models, which features a uniform interface with classical as well as quantum descriptions of the solute, or any hybrid versions thereof. We explain the key concepts of the library design and its application program interface, and demonstrate the use of <span>ddX</span> for integrating into standard chemistry packages. Numerical tests illustrate the performance of <span>ddX</span> and its interfaces.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 4","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/wcms.1726","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daochi Zhang, Lyuzhou Ye, Jiaan Cao, Yao Wang, Rui-Xue Xu, Xiao Zheng, YiJing Yan
Many-body open quantum systems (OQSs) have a profound impact on various subdisciplines of physics, chemistry, and biology. Thus, the development of a computer program capable of accurately, efficiently, and versatilely simulating many-body OQSs is highly desirable. In recent years, we have focused on the advancement of numerical algorithms based on the fermionic hierarchical equations of motion (HEOM) theory. Being in-principle exact, this approach allows for the precise characterization of many-body correlations, non-Markovian memory, and non-equilibrium thermodynamic conditions. These efforts now lead to the establishment of a new computer program, HEOM for QUantum Impurity with a Correlated Kernel, version 2 (HEOM-QUICK2), which, to the best of our knowledge, is currently the only general-purpose simulator for fermionic many-body OQSs. Compared with version 1, the HEOM-QUICK2 program features more efficient solvers for stationary states, more accurate treatment of non-Markovian memory, and improved numerical stability for long-time dissipative dynamics. Integrated with quantum chemistry software, HEOM-QUICK2 has become a valuable theoretical tool for the precise simulation of realistic many-body OQSs, particularly the single atomic or molecular junctions. Furthermore, the unprecedented precision achieved by HEOM-QUICK2 enables accurate simulation of low-energy spin excitations and coherent spin relaxation. The unique usefulness of HEOM-QUICK2 is demonstrated through several examples of strongly correlated quantum impurity systems under non-equilibrium conditions. Thus, the new HEOM-QUICK2 program offers a powerful and comprehensive tool for studying many-body OQSs with exotic quantum phenomena and exploring applications in various disciplines.
{"title":"HEOM-QUICK2: A general-purpose simulator for fermionic many-body open quantum systems—An update","authors":"Daochi Zhang, Lyuzhou Ye, Jiaan Cao, Yao Wang, Rui-Xue Xu, Xiao Zheng, YiJing Yan","doi":"10.1002/wcms.1727","DOIUrl":"10.1002/wcms.1727","url":null,"abstract":"<p>Many-body open quantum systems (OQSs) have a profound impact on various subdisciplines of physics, chemistry, and biology. Thus, the development of a computer program capable of accurately, efficiently, and versatilely simulating many-body OQSs is highly desirable. In recent years, we have focused on the advancement of numerical algorithms based on the fermionic hierarchical equations of motion (HEOM) theory. Being in-principle exact, this approach allows for the precise characterization of many-body correlations, non-Markovian memory, and non-equilibrium thermodynamic conditions. These efforts now lead to the establishment of a new computer program, HEOM for QUantum Impurity with a Correlated Kernel, version 2 (HEOM-QUICK2), which, to the best of our knowledge, is currently the only general-purpose simulator for fermionic many-body OQSs. Compared with version 1, the HEOM-QUICK2 program features more efficient solvers for stationary states, more accurate treatment of non-Markovian memory, and improved numerical stability for long-time dissipative dynamics. Integrated with quantum chemistry software, HEOM-QUICK2 has become a valuable theoretical tool for the precise simulation of realistic many-body OQSs, particularly the single atomic or molecular junctions. Furthermore, the unprecedented precision achieved by HEOM-QUICK2 enables accurate simulation of low-energy spin excitations and coherent spin relaxation. The unique usefulness of HEOM-QUICK2 is demonstrated through several examples of strongly correlated quantum impurity systems under non-equilibrium conditions. Thus, the new HEOM-QUICK2 program offers a powerful and comprehensive tool for studying many-body OQSs with exotic quantum phenomena and exploring applications in various disciplines.</p><p>This article is categorized under:\u0000 </p>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"14 4","pages":""},"PeriodicalIF":16.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}