Pub Date : 2024-05-14DOI: 10.1038/s43588-024-00631-6
Matthew Holcomb, Stefano Forli
MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.
{"title":"A multidimensional dataset for structure-based machine learning","authors":"Matthew Holcomb, Stefano Forli","doi":"10.1038/s43588-024-00631-6","DOIUrl":"10.1038/s43588-024-00631-6","url":null,"abstract":"MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-10DOI: 10.1038/s43588-024-00627-2
Till Siebenmorgen, Filipe Menezes, Sabrina Benassou, Erinc Merdivan, Kieran Didi, André Santos Dias Mourão, Radosław Kitel, Pietro Liò, Stefan Kesselheim, Marie Piraud, Fabian J. Theis, Michael Sattler, Grzegorz M. Popowicz
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule–ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein–ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein–ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models. MISATO is a database for structure-based drug discovery that combines quantum mechanics data with molecular dynamics simulations on ~20,000 protein–ligand structures. The artificial intelligence models included provide an easy entry point for the machine learning and drug discovery communities.
{"title":"MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery","authors":"Till Siebenmorgen, Filipe Menezes, Sabrina Benassou, Erinc Merdivan, Kieran Didi, André Santos Dias Mourão, Radosław Kitel, Pietro Liò, Stefan Kesselheim, Marie Piraud, Fabian J. Theis, Michael Sattler, Grzegorz M. Popowicz","doi":"10.1038/s43588-024-00627-2","DOIUrl":"10.1038/s43588-024-00627-2","url":null,"abstract":"Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule–ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein–ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein–ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models. MISATO is a database for structure-based drug discovery that combines quantum mechanics data with molecular dynamics simulations on ~20,000 protein–ligand structures. The artificial intelligence models included provide an easy entry point for the machine learning and drug discovery communities.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00627-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models—especially variational autoencoders—have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE’s capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively. A method based on a vector-quantized variational autoencoder, called CASTLE, can interpretably extract discrete latent embeddings and quantitatively generate the cell-type-specific feature spectrum for single-cell chromatin accessibility sequencing data.
{"title":"Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity","authors":"Xuejian Cui, Xiaoyang Chen, Zhen Li, Zijing Gao, Shengquan Chen, Rui Jiang","doi":"10.1038/s43588-024-00625-4","DOIUrl":"10.1038/s43588-024-00625-4","url":null,"abstract":"Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models—especially variational autoencoders—have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE’s capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively. A method based on a vector-quantized variational autoencoder, called CASTLE, can interpretably extract discrete latent embeddings and quantitatively generate the cell-type-specific feature spectrum for single-cell chromatin accessibility sequencing data.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-02DOI: 10.1038/s43588-024-00624-5
Domenic P. J. Germano, James M. Osborne
Multicellular modeling is increasingly being used to understand biological systems. SimuCell3D is a tool that allows mechanically realistic simulations, using the deformable cell model, to be developed and run.
{"title":"Advancements in multicellular simulations","authors":"Domenic P. J. Germano, James M. Osborne","doi":"10.1038/s43588-024-00624-5","DOIUrl":"10.1038/s43588-024-00624-5","url":null,"abstract":"Multicellular modeling is increasingly being used to understand biological systems. SimuCell3D is a tool that allows mechanically realistic simulations, using the deformable cell model, to be developed and run.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-26DOI: 10.1038/s43588-024-00628-1
We highlight the vibrant discussions on quantum computing and quantum algorithms that took place at the 2024 American Physical Society March Meeting and invite submissions that notably drive the field of quantum information science forward.
{"title":"Harnessing quantum information to advance computing","authors":"","doi":"10.1038/s43588-024-00628-1","DOIUrl":"10.1038/s43588-024-00628-1","url":null,"abstract":"We highlight the vibrant discussions on quantum computing and quantum algorithms that took place at the 2024 American Physical Society March Meeting and invite submissions that notably drive the field of quantum information science forward.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00628-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1038/s43588-024-00623-6
Valerio Capraro, Matjaž Perc
Cooperation is crucial for human prosperity, and population structure fosters it through pairwise interactions and coordinated behavior in larger groups. A recent study explores the evolution of behavioral strategies in higher-order population structures, including pairwise and multi-way interactions to reveal that higher-order interactions promote cooperation across networks, especially when they are formed by conjoined communities.
{"title":"In search of the most cooperative network","authors":"Valerio Capraro, Matjaž Perc","doi":"10.1038/s43588-024-00623-6","DOIUrl":"10.1038/s43588-024-00623-6","url":null,"abstract":"Cooperation is crucial for human prosperity, and population structure fosters it through pairwise interactions and coordinated behavior in larger groups. A recent study explores the evolution of behavioral strategies in higher-order population structures, including pairwise and multi-way interactions to reveal that higher-order interactions promote cooperation across networks, especially when they are formed by conjoined communities.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1038/s43588-024-00626-3
SANGO efficiently removed batch effects between the query and reference single-cell ATAC signals through the underlying genome sequences, to enable cell type assignment according to the reference data. The method achieved superior performance on diverse datasets and could detect unknown tumor cells, providing valuable functional biological signals.
{"title":"Annotating cell types in single-cell ATAC data via the guidance of the underlying DNA sequences","authors":"","doi":"10.1038/s43588-024-00626-3","DOIUrl":"10.1038/s43588-024-00626-3","url":null,"abstract":"SANGO efficiently removed batch effects between the query and reference single-cell ATAC signals through the underlying genome sequences, to enable cell type assignment according to the reference data. The method achieved superior performance on diverse datasets and could detect unknown tumor cells, providing valuable functional biological signals.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1038/s43588-024-00618-3
Approaches are needed to accelerate the discovery of transition metal complexes (TMCs), which is challenging owing to their vast chemical space. A large dataset of diverse ligands is now introduced and leveraged in a multiobjective genetic algorithm that enables the efficient optimization of TMCs in chemical spaces containing billions of them.
{"title":"Discovering metal complexes in vast chemical spaces","authors":"","doi":"10.1038/s43588-024-00618-3","DOIUrl":"10.1038/s43588-024-00618-3","url":null,"abstract":"Approaches are needed to accelerate the discovery of transition metal complexes (TMCs), which is challenging owing to their vast chemical space. A large dataset of diverse ligands is now introduced and leveraged in a multiobjective genetic algorithm that enables the efficient optimization of TMCs in chemical spaces containing billions of them.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140611209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-15DOI: 10.1038/s43588-024-00621-8
Anzhi Sheng, Qi Su, Long Wang, Joshua B. Plotkin
Cooperation is key to prosperity in human societies. Population structure is well understood as a catalyst for cooperation, where research has focused on pairwise interactions. But cooperative behaviors are not simply dyadic, and they often involve coordinated behavior in larger groups. Here we develop a framework to study the evolution of behavioral strategies in higher-order population structures, which include pairwise and multi-way interactions. We provide an analytical treatment of when cooperation will be favored by higher-order interactions, accounting for arbitrary spatial heterogeneity and nonlinear rewards for cooperation in larger groups. Our results indicate that higher-order interactions can act to promote the evolution of cooperation across a broad range of networks, in public goods games. Higher-order interactions consistently provide an advantage for cooperation when interaction hyper-networks feature multiple conjoined communities. Our analysis provides a systematic account of how higher-order interactions modulate the evolution of prosocial traits. Cooperation is not merely a dyadic phenomenon, it also includes multi-way social interactions. A mathematical framework is developed to study how the structure of higher-order interactions influences cooperative behavior.
{"title":"Strategy evolution on higher-order networks","authors":"Anzhi Sheng, Qi Su, Long Wang, Joshua B. Plotkin","doi":"10.1038/s43588-024-00621-8","DOIUrl":"10.1038/s43588-024-00621-8","url":null,"abstract":"Cooperation is key to prosperity in human societies. Population structure is well understood as a catalyst for cooperation, where research has focused on pairwise interactions. But cooperative behaviors are not simply dyadic, and they often involve coordinated behavior in larger groups. Here we develop a framework to study the evolution of behavioral strategies in higher-order population structures, which include pairwise and multi-way interactions. We provide an analytical treatment of when cooperation will be favored by higher-order interactions, accounting for arbitrary spatial heterogeneity and nonlinear rewards for cooperation in larger groups. Our results indicate that higher-order interactions can act to promote the evolution of cooperation across a broad range of networks, in public goods games. Higher-order interactions consistently provide an advantage for cooperation when interaction hyper-networks feature multiple conjoined communities. Our analysis provides a systematic account of how higher-order interactions modulate the evolution of prosocial traits. Cooperation is not merely a dyadic phenomenon, it also includes multi-way social interactions. A mathematical framework is developed to study how the structure of higher-order interactions influences cooperative behavior.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140580608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}