Recent advances in single‐cell chromatin accessibility sequencing (scCAS) technologies have resulted in new insights into the characterization of epigenomic heterogeneity and have increased the need for automatic cell type annotation. However, existing automatic annotation methods for scCAS data fail to incorporate the reference data and neglect novel cell types, which only exist in a test set. Here, we propose RAINBOW, a reference‐guided automatic annotation method based on the contrastive learning framework, which is capable of effectively identifying novel cell types in a test set. By utilizing contrastive learning and incorporating reference data, RAINBOW can effectively characterize the heterogeneity of cell types, thereby facilitating more accurate annotation. With extensive experiments on multiple scCAS datasets, we show the advantages of RAINBOW over state‐of‐the‐art methods in known and novel cell type annotation. We also verify the effectiveness of incorporating reference data during the training process. In addition, we demonstrate the robustness of RAINBOW to data sparsity and number of cell types. Furthermore, RAINBOW provides superior performance in newly sequenced data and can reveal biological implication in downstream analyses. All the results demonstrate the superior performance of RAINBOW in cell type annotation for scCAS data. We anticipate that RAINBOW will offer essential guidance and great assistance in scCAS data analysis. The source codes are available at the GitHub website (BioX‐NKU/RAINBOW).
{"title":"Accurate cell type annotation for single‐cell chromatin accessibility data via contrastive learning and reference guidance","authors":"Siyu Li, Songming Tang, Yunchang Wang, Sijie Li, Yuhang Jia, Shengquan Chen","doi":"10.1002/qub2.33","DOIUrl":"https://doi.org/10.1002/qub2.33","url":null,"abstract":"Recent advances in single‐cell chromatin accessibility sequencing (scCAS) technologies have resulted in new insights into the characterization of epigenomic heterogeneity and have increased the need for automatic cell type annotation. However, existing automatic annotation methods for scCAS data fail to incorporate the reference data and neglect novel cell types, which only exist in a test set. Here, we propose RAINBOW, a reference‐guided automatic annotation method based on the contrastive learning framework, which is capable of effectively identifying novel cell types in a test set. By utilizing contrastive learning and incorporating reference data, RAINBOW can effectively characterize the heterogeneity of cell types, thereby facilitating more accurate annotation. With extensive experiments on multiple scCAS datasets, we show the advantages of RAINBOW over state‐of‐the‐art methods in known and novel cell type annotation. We also verify the effectiveness of incorporating reference data during the training process. In addition, we demonstrate the robustness of RAINBOW to data sparsity and number of cell types. Furthermore, RAINBOW provides superior performance in newly sequenced data and can reveal biological implication in downstream analyses. All the results demonstrate the superior performance of RAINBOW in cell type annotation for scCAS data. We anticipate that RAINBOW will offer essential guidance and great assistance in scCAS data analysis. The source codes are available at the GitHub website (BioX‐NKU/RAINBOW).","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139792754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yazhen Song, Chenxi Feng, Difei Zhou, Zeng-Xin Ma, Lian He, Cong Zhang, Guihong Yu, Yan Zhao, Song Yang, Xinhui Xing
Developing methylotrophic cell factories that can efficiently catalyze organic one‐carbon (C1) feedstocks derived from electrocatalytic reduction of carbon dioxide into bio‐based chemicals and biofuels is of strategic significance for building a carbon‐neutral, sustainable economic and industrial system. With the rapid advancement of RNA sequencing technology and mass spectrometer analysis, researchers have used these quantitative microbiology methods extensively, especially isotope‐based metabolic flux analysis, to study the metabolic processes initiating from C1 feedstocks in natural C1‐utilizing bacteria and synthetic C1 bacteria. This paper reviews the use of advanced quantitative analysis in recent years to understand the metabolic network and basic principles in the metabolism of natural C1‐utilizing bacteria grown on methane, methanol, or formate. The acquired knowledge serves as a guide to rewire the central methylotrophic metabolism of natural C1‐utilizing bacteria to improve the carbon conversion efficiency, and to engineer non‐C1‐utilizing bacteria into synthetic strains that can use C1 feedstocks as the sole carbon and energy source. These progresses ultimately enhance the design and construction of highly efficient C1‐based cell factories to synthesize diverse high value‐added products. The integration of quantitative biology and synthetic biology will advance the iterative cycle of understand–design–build–testing–learning to enhance C1‐based biomanufacturing in the future.
{"title":"Constructing efficient bacterial cell factories to enable one‐carbon utilization based on quantitative biology: A review","authors":"Yazhen Song, Chenxi Feng, Difei Zhou, Zeng-Xin Ma, Lian He, Cong Zhang, Guihong Yu, Yan Zhao, Song Yang, Xinhui Xing","doi":"10.1002/qub2.31","DOIUrl":"https://doi.org/10.1002/qub2.31","url":null,"abstract":"Developing methylotrophic cell factories that can efficiently catalyze organic one‐carbon (C1) feedstocks derived from electrocatalytic reduction of carbon dioxide into bio‐based chemicals and biofuels is of strategic significance for building a carbon‐neutral, sustainable economic and industrial system. With the rapid advancement of RNA sequencing technology and mass spectrometer analysis, researchers have used these quantitative microbiology methods extensively, especially isotope‐based metabolic flux analysis, to study the metabolic processes initiating from C1 feedstocks in natural C1‐utilizing bacteria and synthetic C1 bacteria. This paper reviews the use of advanced quantitative analysis in recent years to understand the metabolic network and basic principles in the metabolism of natural C1‐utilizing bacteria grown on methane, methanol, or formate. The acquired knowledge serves as a guide to rewire the central methylotrophic metabolism of natural C1‐utilizing bacteria to improve the carbon conversion efficiency, and to engineer non‐C1‐utilizing bacteria into synthetic strains that can use C1 feedstocks as the sole carbon and energy source. These progresses ultimately enhance the design and construction of highly efficient C1‐based cell factories to synthesize diverse high value‐added products. The integration of quantitative biology and synthetic biology will advance the iterative cycle of understand–design–build–testing–learning to enhance C1‐based biomanufacturing in the future.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139791768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gene regulatory network (GRN) inference from gene expression data is a significant approach to understanding aspects of the biological system. Compared with generalized correlation‐based methods, causality‐inspired ones seem more rational to infer regulatory relationships. We propose GRINCD, a novel GRN inference framework empowered by graph representation learning and causal asymmetric learning, considering both linear and non‐linear regulatory relationships. First, high‐quality representation of each gene is generated using graph neural network. Then, we apply the additive noise model to predict the causal regulation of each regulator‐target pair. Additionally, we design two channels and finally assemble them for robust prediction. Through comprehensive comparisons of our framework with state‐of‐the‐art methods based on different principles on numerous datasets of diverse types and scales, the experimental results show that our framework achieves superior or comparable performance under various evaluation metrics. Our work provides a new clue for constructing GRNs, and our proposed framework GRINCD also shows potential in identifying key factors affecting cancer development.
{"title":"Gene regulatory network inference based on causal discovery integrating with graph neural network","authors":"Ke Feng, Hongyang Jiang, Chaoyi Yin, Huiyan Sun","doi":"10.1002/qub2.26","DOIUrl":"https://doi.org/10.1002/qub2.26","url":null,"abstract":"Gene regulatory network (GRN) inference from gene expression data is a significant approach to understanding aspects of the biological system. Compared with generalized correlation‐based methods, causality‐inspired ones seem more rational to infer regulatory relationships. We propose GRINCD, a novel GRN inference framework empowered by graph representation learning and causal asymmetric learning, considering both linear and non‐linear regulatory relationships. First, high‐quality representation of each gene is generated using graph neural network. Then, we apply the additive noise model to predict the causal regulation of each regulator‐target pair. Additionally, we design two channels and finally assemble them for robust prediction. Through comprehensive comparisons of our framework with state‐of‐the‐art methods based on different principles on numerous datasets of diverse types and scales, the experimental results show that our framework achieves superior or comparable performance under various evaluation metrics. Our work provides a new clue for constructing GRNs, and our proposed framework GRINCD also shows potential in identifying key factors affecting cancer development.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139022894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The information on host–microbe interactions contained in the operational taxonomic unit (OTU) abundance table can serve as a clue to understanding the biological traits of OTUs and samples. Some studies have inferred the taxonomies or functions of OTUs by constructing co‐occurrence networks, but co‐occurrence networks can only encompass a small fraction of all OTUs due to the high sparsity of the OTU table. There is a lack of studies that intensively explore and use the information on sample‐OTU interactions. This study constructed a sample‐OTU heterogeneous information network and represented the nodes in the network through the heterogeneous graph embedding method to form the OTU space and sample space. Taking advantage of the represented OTU and sample vectors combined with the original OTU abundance information, an Integrated Model of Embedded Taxonomies and Abundance (IMETA) was proposed for predicting sample attributes, such as phenotypes and individual diet habits. Both the OTU space and sample space contain reasonable biological or medical semantic information, and the IMETA using embedded OTU and sample vectors can have stable and good performance in the sample classification tasks. This suggests that the embedding representation based on the sample‐OTU heterogeneous information network can provide more useful information for understanding microbiome samples. This study conducted quantified representations of the biological characteristics within the OTUs and samples, which is a good attempt to increase the utilization rate of information in the OTU abundance table, and it promotes a deeper understanding of the underlying knowledge of human microbiome.
{"title":"Reorganizing heterogeneous information from host–microbe interaction reveals innate associations among samples","authors":"Hongfei Cui","doi":"10.1002/qub2.25","DOIUrl":"https://doi.org/10.1002/qub2.25","url":null,"abstract":"The information on host–microbe interactions contained in the operational taxonomic unit (OTU) abundance table can serve as a clue to understanding the biological traits of OTUs and samples. Some studies have inferred the taxonomies or functions of OTUs by constructing co‐occurrence networks, but co‐occurrence networks can only encompass a small fraction of all OTUs due to the high sparsity of the OTU table. There is a lack of studies that intensively explore and use the information on sample‐OTU interactions. This study constructed a sample‐OTU heterogeneous information network and represented the nodes in the network through the heterogeneous graph embedding method to form the OTU space and sample space. Taking advantage of the represented OTU and sample vectors combined with the original OTU abundance information, an Integrated Model of Embedded Taxonomies and Abundance (IMETA) was proposed for predicting sample attributes, such as phenotypes and individual diet habits. Both the OTU space and sample space contain reasonable biological or medical semantic information, and the IMETA using embedded OTU and sample vectors can have stable and good performance in the sample classification tasks. This suggests that the embedding representation based on the sample‐OTU heterogeneous information network can provide more useful information for understanding microbiome samples. This study conducted quantified representations of the biological characteristics within the OTUs and samples, which is a good attempt to increase the utilization rate of information in the OTU abundance table, and it promotes a deeper understanding of the underlying knowledge of human microbiome.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139214325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dali Wang, Jiaxuan Li, Lei Wang, Yipeng Cao, Bo Kang, Xiangfei Meng, Sai Li, Chen Song
The causative pathogen of coronavirus disease 2019 (COVID‐19), severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), is an enveloped virus assembled by a lipid envelope and multiple structural proteins. In this study, by integrating experimental data, structural modeling, as well as coarse‐grained and all‐atom molecular dynamics simulations, we constructed multiscale models of SARS‐CoV‐2. Our 500‐ns coarse‐grained simulation of the intact virion allowed us to investigate the dynamic behavior of the membrane‐embedded proteins and the surrounding lipid molecules in situ. Our results indicated that the membrane‐embedded proteins are highly dynamic, and certain types of lipids exhibit various binding preferences to specific sites of the membrane‐embedded proteins. The equilibrated virion model was transformed into atomic resolution, which provided a 3D structure for scientific demonstration and can serve as a framework for future exascale all‐atom molecular dynamics (MD) simulations. A short all‐atom molecular dynamics simulation of 255 ps was conducted as a preliminary test for large‐scale simulations of this complex system.
{"title":"Toward atomistic models of intact severe acute respiratory syndrome coronavirus 2 via Martini coarse‐grained molecular dynamics simulations","authors":"Dali Wang, Jiaxuan Li, Lei Wang, Yipeng Cao, Bo Kang, Xiangfei Meng, Sai Li, Chen Song","doi":"10.1002/qub2.20","DOIUrl":"https://doi.org/10.1002/qub2.20","url":null,"abstract":"The causative pathogen of coronavirus disease 2019 (COVID‐19), severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), is an enveloped virus assembled by a lipid envelope and multiple structural proteins. In this study, by integrating experimental data, structural modeling, as well as coarse‐grained and all‐atom molecular dynamics simulations, we constructed multiscale models of SARS‐CoV‐2. Our 500‐ns coarse‐grained simulation of the intact virion allowed us to investigate the dynamic behavior of the membrane‐embedded proteins and the surrounding lipid molecules in situ. Our results indicated that the membrane‐embedded proteins are highly dynamic, and certain types of lipids exhibit various binding preferences to specific sites of the membrane‐embedded proteins. The equilibrated virion model was transformed into atomic resolution, which provided a 3D structure for scientific demonstration and can serve as a framework for future exascale all‐atom molecular dynamics (MD) simulations. A short all‐atom molecular dynamics simulation of 255 ps was conducted as a preliminary test for large‐scale simulations of this complex system.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139223234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Creating a man‐made life in the laboratory is one of life science’s most intriguing yet challenging problems. Advances in synthetic biology and related theories, particularly those related to the origin of life, have laid the groundwork for further exploration and understanding in this field of artificial life or man‐made life. But there remains a wealth of quantitative mathematical models and tools that have yet to be applied to this area. In this paper, we review the two main approaches often employed in the field of man‐made life: the top‐down approach that reduces the complexity of extant and existing living systems and the bottom‐up approach that integrates well‐defined components, by introducing the theoretical basis, recent advances, and their limitations. We then argue for another possible approach, namely “bottom‐up from the origin of life”: Starting with the establishment of autocatalytic chemical reaction networks that employ physical boundaries as the initial compartments, then designing directed evolutionary systems, with the expectation that independent compartments will eventually emerge so that the system becomes free‐living. This approach is actually analogous to the process of how life originated. With this paper, we aim to stimulate the interest of synthetic biologists and experimentalists to consider a more theoretical perspective, and to promote the communication between the origin of life community and the synthetic man‐made life community.
{"title":"Theoretical perspective on synthetic man‐made life: Learning from the origin of life","authors":"Lu Peng, Zecheng Zhang, Xianyi Wang, Weiyi Qiu, Liqian Zhou, Hui Xiao, Chunxiuzi Liu, Shaohua Tang, Zhiwei Qin, Jiakun Jiang, Zengru Di, Yu Liu","doi":"10.1002/qub2.22","DOIUrl":"https://doi.org/10.1002/qub2.22","url":null,"abstract":"Creating a man‐made life in the laboratory is one of life science’s most intriguing yet challenging problems. Advances in synthetic biology and related theories, particularly those related to the origin of life, have laid the groundwork for further exploration and understanding in this field of artificial life or man‐made life. But there remains a wealth of quantitative mathematical models and tools that have yet to be applied to this area. In this paper, we review the two main approaches often employed in the field of man‐made life: the top‐down approach that reduces the complexity of extant and existing living systems and the bottom‐up approach that integrates well‐defined components, by introducing the theoretical basis, recent advances, and their limitations. We then argue for another possible approach, namely “bottom‐up from the origin of life”: Starting with the establishment of autocatalytic chemical reaction networks that employ physical boundaries as the initial compartments, then designing directed evolutionary systems, with the expectation that independent compartments will eventually emerge so that the system becomes free‐living. This approach is actually analogous to the process of how life originated. With this paper, we aim to stimulate the interest of synthetic biologists and experimentalists to consider a more theoretical perspective, and to promote the communication between the origin of life community and the synthetic man‐made life community.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139231000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electroactive microorganisms (EAMs) could utilize extracellular electron transfer (EET) pathways to exchange electrons and energy with their external surroundings. Conductive cytochrome proteins and nanowires play crucial roles in controlling electron transfer rate from cytosol to extracellular electrode. Many previous studies elucidated how the c‐type cytochrome proteins and conductive nanowires are synthesized, assembled, and engineered to manipulate the EET rate, and quantified the kinetic processes of electron generation and EET. Here, we firstly overview the electron transfer pathways of EAMs and quantify the kinetic parameters that dictating intracellular electron production and EET. Secondly, we systematically review the structure, conductivity mechanisms, and engineering strategies to manipulate conductive cytochromes and nanowire in EAMs. Lastly, we outlook potential directions for future research in cytochromes and conductive nanowires for enhanced electron transfer. This article reviews the quantitative kinetics of intracellular electron production and EET, and the contribution of engineered c‐type cytochromes and conductive nanowire in enhancing the EET rate, which lay the foundation for enhancing electron transfer capacity of EAMs.
{"title":"Conductive proteins‐based extracellular electron transfer of electroactive microorganisms","authors":"Junqi Zhang, Zixuan You, Dingyuan Liu, Rui Tang, Chao Zhao, Yingxiu Cao, Feng Li, Hao-Qing Song","doi":"10.1002/qub2.24","DOIUrl":"https://doi.org/10.1002/qub2.24","url":null,"abstract":"Electroactive microorganisms (EAMs) could utilize extracellular electron transfer (EET) pathways to exchange electrons and energy with their external surroundings. Conductive cytochrome proteins and nanowires play crucial roles in controlling electron transfer rate from cytosol to extracellular electrode. Many previous studies elucidated how the c‐type cytochrome proteins and conductive nanowires are synthesized, assembled, and engineered to manipulate the EET rate, and quantified the kinetic processes of electron generation and EET. Here, we firstly overview the electron transfer pathways of EAMs and quantify the kinetic parameters that dictating intracellular electron production and EET. Secondly, we systematically review the structure, conductivity mechanisms, and engineering strategies to manipulate conductive cytochromes and nanowire in EAMs. Lastly, we outlook potential directions for future research in cytochromes and conductive nanowires for enhanced electron transfer. This article reviews the quantitative kinetics of intracellular electron production and EET, and the contribution of engineered c‐type cytochromes and conductive nanowire in enhancing the EET rate, which lay the foundation for enhancing electron transfer capacity of EAMs.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139228917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The prediction of molecular properties is a crucial task in the field of drug discovery. Computational methods that can accurately predict molecular properties can significantly accelerate the drug discovery process and reduce the cost of drug discovery. In recent years, iterative updates in computing hardware and the rise of deep learning have created a new and effective path for molecular property prediction. Deep learning methods can leverage the vast amount of data accumulated over the years in drug discovery and do not require complex feature engineering. In this review, we summarize molecular representations and commonly used datasets in molecular property prediction models and present advanced deep learning methods for molecular property prediction, including state‐of‐the‐art deep learning networks such as graph neural networks and Transformer‐based models, as well as state‐of‐the‐art deep learning strategies such as 3D pre‐train, contrastive learning, multi‐task learning, transfer learning, and meta‐learning. We also point out some critical issues such as lack of datasets, low information utilization, and lack of specificity for diseases.
预测分子性质是药物发现领域的一项重要任务。能够准确预测分子性质的计算方法可以大大加快药物发现的进程,降低药物发现的成本。近年来,计算硬件的迭代更新和深度学习的兴起为分子性质预测开辟了一条新的有效途径。深度学习方法可以利用药物发现过程中多年积累的大量数据,而且不需要复杂的特征工程。在这篇综述中,我们总结了分子性质预测模型中的分子表征和常用数据集,并介绍了用于分子性质预测的先进深度学习方法,包括最先进的深度学习网络(如图神经网络和基于 Transformer 的模型),以及最先进的深度学习策略(如 3D 预训练、对比学习、多任务学习、迁移学习和元学习)。我们还指出了一些关键问题,如缺乏数据集、信息利用率低、缺乏疾病特异性等。
{"title":"Advanced deep learning methods for molecular property prediction","authors":"Chao Pang, Henry H. Y. Tong, Leyi Wei","doi":"10.1002/qub2.23","DOIUrl":"https://doi.org/10.1002/qub2.23","url":null,"abstract":"The prediction of molecular properties is a crucial task in the field of drug discovery. Computational methods that can accurately predict molecular properties can significantly accelerate the drug discovery process and reduce the cost of drug discovery. In recent years, iterative updates in computing hardware and the rise of deep learning have created a new and effective path for molecular property prediction. Deep learning methods can leverage the vast amount of data accumulated over the years in drug discovery and do not require complex feature engineering. In this review, we summarize molecular representations and commonly used datasets in molecular property prediction models and present advanced deep learning methods for molecular property prediction, including state‐of‐the‐art deep learning networks such as graph neural networks and Transformer‐based models, as well as state‐of‐the‐art deep learning strategies such as 3D pre‐train, contrastive learning, multi‐task learning, transfer learning, and meta‐learning. We also point out some critical issues such as lack of datasets, low information utilization, and lack of specificity for diseases.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139259366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feiran Li, Yu Chen, Johan Gustafsson, Hao Wang, Yi Wang, Chong Zhang, Xinhui Xing
Abstract Over the last 15 years, genome‐scale metabolic models (GEMs) have been reconstructed for human and model animals, such as mouse and rat, to systematically understand metabolism, simulate multicellular or multi‐tissue interplay, understand human diseases, and guide cell factory design for biopharmaceutical protein production. Here, we describe how metabolic networks can be represented using stoichiometric matrices and well‐defined constraints for flux simulation. Then, we review the history of GEM development for quantitative understanding of Homo sapiens and other relevant animals, together with their applications. We describe how model develops from H . sapiens to other animals and from generic purpose to precise context‐specific simulation. The progress of GEMs for animals greatly expand our systematic understanding of metabolism in human and related animals. We discuss the difficulties and present perspectives on the GEM development and the quest to integrate more biological processes and omics data for future research and translation. We truly hope that this review can inspire new models developed for other mammalian organisms and generate new algorithms for integrating big data to conduct more in‐depth analysis to further make progress on human health and biopharmaceutical engineering.
{"title":"Genome‐scale metabolic models applied for human health and biopharmaceutical engineering","authors":"Feiran Li, Yu Chen, Johan Gustafsson, Hao Wang, Yi Wang, Chong Zhang, Xinhui Xing","doi":"10.1002/qub2.21","DOIUrl":"https://doi.org/10.1002/qub2.21","url":null,"abstract":"Abstract Over the last 15 years, genome‐scale metabolic models (GEMs) have been reconstructed for human and model animals, such as mouse and rat, to systematically understand metabolism, simulate multicellular or multi‐tissue interplay, understand human diseases, and guide cell factory design for biopharmaceutical protein production. Here, we describe how metabolic networks can be represented using stoichiometric matrices and well‐defined constraints for flux simulation. Then, we review the history of GEM development for quantitative understanding of Homo sapiens and other relevant animals, together with their applications. We describe how model develops from H . sapiens to other animals and from generic purpose to precise context‐specific simulation. The progress of GEMs for animals greatly expand our systematic understanding of metabolism in human and related animals. We discuss the difficulties and present perspectives on the GEM development and the quest to integrate more biological processes and omics data for future research and translation. We truly hope that this review can inspire new models developed for other mammalian organisms and generate new algorithms for integrating big data to conduct more in‐depth analysis to further make progress on human health and biopharmaceutical engineering.","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, Quantitative Biology (QB) held a discussion on “AI (artificial intelligence) for Life Science” among editorial board members and interested scholars in anticipation of rapid development of this growing area after AlphaGo and ChatGPT mania. Many young people tend to get confused between facts and fictions; heated debates are unavoidable even among their mentors. When deep learning as represented by convolutional neural networks and LSTM (long short-term memory) was made available for bioinformatics students, many of them rushed into this research field and tried to adopt these methods in all their projects without knowing the history that these tools were becoming successful consistently with Moore’s Law (relating to rapid computer technology advances), but more importantly due to new structural/functional understanding of vision and auditory circuits in the brain. Recently, some young people have claimed “LSTM is dead, long live transformer” (which is somewhat like saying “the bike is dead, long live the car”), and have amplified the threat that ChatGPT could wipe out human jobs. They believe transformer is the “silver bullet” for all learning tasks, clearly reflecting their lack of basic knowledge (i.e. “No Free Lunch Theory,” the trade-off of such global “attention network” is to pay the price for complexity: difficulty of training and high memory costs). There is no doubt ML (machine learning) and AI have brought a new revolution in science and technology, and will deliver huge unforeseeable impact to human everyday life as well as to social relationships. In this context, QB journal could be a great platform for encouraging intellectual discussions and for promoting AI for Life Science. Here, I would like to use the DIALOG to “抛砖引玉” (make some initial remarks to get the ball rolling), although it is my personal opinion which is inevitably subject to bias and limitations. AI: Do you know my name “Artificial Intelligence” is defined by the Oxford English Dictionary as the capacity of computer systems (which may be referred as a “robot”) to exhibit or simulate your intelligent behavior? NI: Wait a minute, intelligence itself is defined as the ability to learn, understand and think in a logical way. Can you think? AI: No. But that definition is too restrictive, actually intelligence has different scopes and degrees. Simple intelligent control devices date back to antiquity, from windmills to thermostat. NI: Agree, everything is relative. Macromolecules (e.g., enzyme) and cells (e.g., immune cell) might be considered to be intelligent; see how a white blood cell is chasing bacteria in the youtube website (search for “Crawling neutrophil chasing a bacterium”). Our emergent/collective intelligent behavior does not require a brain or even a neuron; see how slime molds can solve optimization—Hamilton cycle-problem more effectively than a human in the youtube website (search for “Intelligence without a brain?”). Before there was any neuron, C
{"title":"Dialog between artificial intelligence & natural intelligence","authors":"Michael Q. Zhang","doi":"10.1002/qub2.5","DOIUrl":"https://doi.org/10.1002/qub2.5","url":null,"abstract":"Recently, Quantitative Biology (QB) held a discussion on “AI (artificial intelligence) for Life Science” among editorial board members and interested scholars in anticipation of rapid development of this growing area after AlphaGo and ChatGPT mania. Many young people tend to get confused between facts and fictions; heated debates are unavoidable even among their mentors. When deep learning as represented by convolutional neural networks and LSTM (long short-term memory) was made available for bioinformatics students, many of them rushed into this research field and tried to adopt these methods in all their projects without knowing the history that these tools were becoming successful consistently with Moore’s Law (relating to rapid computer technology advances), but more importantly due to new structural/functional understanding of vision and auditory circuits in the brain. Recently, some young people have claimed “LSTM is dead, long live transformer” (which is somewhat like saying “the bike is dead, long live the car”), and have amplified the threat that ChatGPT could wipe out human jobs. They believe transformer is the “silver bullet” for all learning tasks, clearly reflecting their lack of basic knowledge (i.e. “No Free Lunch Theory,” the trade-off of such global “attention network” is to pay the price for complexity: difficulty of training and high memory costs). There is no doubt ML (machine learning) and AI have brought a new revolution in science and technology, and will deliver huge unforeseeable impact to human everyday life as well as to social relationships. In this context, QB journal could be a great platform for encouraging intellectual discussions and for promoting AI for Life Science. Here, I would like to use the DIALOG to “抛砖引玉” (make some initial remarks to get the ball rolling), although it is my personal opinion which is inevitably subject to bias and limitations. AI: Do you know my name “Artificial Intelligence” is defined by the Oxford English Dictionary as the capacity of computer systems (which may be referred as a “robot”) to exhibit or simulate your intelligent behavior? NI: Wait a minute, intelligence itself is defined as the ability to learn, understand and think in a logical way. Can you think? AI: No. But that definition is too restrictive, actually intelligence has different scopes and degrees. Simple intelligent control devices date back to antiquity, from windmills to thermostat. NI: Agree, everything is relative. Macromolecules (e.g., enzyme) and cells (e.g., immune cell) might be considered to be intelligent; see how a white blood cell is chasing bacteria in the youtube website (search for “Crawling neutrophil chasing a bacterium”). Our emergent/collective intelligent behavior does not require a brain or even a neuron; see how slime molds can solve optimization—Hamilton cycle-problem more effectively than a human in the youtube website (search for “Intelligence without a brain?”). Before there was any neuron, C","PeriodicalId":45660,"journal":{"name":"Quantitative Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135974128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}