This study addresses the challenging task of identifying viruses within metagenomic data, which encompasses a broad array of biological samples, including animal reservoirs, environmental sources, and the human body. Traditional methods for virus identification often face limitations due to the diversity and rapid evolution of viral genomes. In response, recent efforts have focused on leveraging artificial intelligence (AI) techniques to enhance accuracy and efficiency in virus detection. However, existing AI-based approaches are primarily binary classifiers, lacking specificity in identifying viral types and reliant on nucleotide sequences. To address these limitations, VirDetect-AI, a novel tool specifically designed for the identification of eukaryotic viruses within metagenomic datasets, is introduced. The VirDetect-AI model employs a combination of convolutional neural networks and residual neural networks to effectively extract hierarchical features and detailed patterns from complex amino acid genomic data. The results demonstrated that the model has outstanding results in all metrics, with a sensitivity of 0.97, a precision of 0.98, and an F1-score of 0.98. VirDetect-AI improves our comprehension of viral ecology and can accurately classify metagenomic sequences into 980 viral protein classes, hence enabling the identification of new viruses. These classes encompass an extensive array of viral genera and families, as well as protein functions and hosts.
{"title":"VirDetect-AI: a residual and convolutional neural network-based metagenomic tool for eukaryotic viral protein identification.","authors":"Alida Zárate, Lorena Díaz-González, Blanca Taboada","doi":"10.1093/bib/bbaf001","DOIUrl":"10.1093/bib/bbaf001","url":null,"abstract":"<p><p>This study addresses the challenging task of identifying viruses within metagenomic data, which encompasses a broad array of biological samples, including animal reservoirs, environmental sources, and the human body. Traditional methods for virus identification often face limitations due to the diversity and rapid evolution of viral genomes. In response, recent efforts have focused on leveraging artificial intelligence (AI) techniques to enhance accuracy and efficiency in virus detection. However, existing AI-based approaches are primarily binary classifiers, lacking specificity in identifying viral types and reliant on nucleotide sequences. To address these limitations, VirDetect-AI, a novel tool specifically designed for the identification of eukaryotic viruses within metagenomic datasets, is introduced. The VirDetect-AI model employs a combination of convolutional neural networks and residual neural networks to effectively extract hierarchical features and detailed patterns from complex amino acid genomic data. The results demonstrated that the model has outstanding results in all metrics, with a sensitivity of 0.97, a precision of 0.98, and an F1-score of 0.98. VirDetect-AI improves our comprehension of viral ecology and can accurately classify metagenomic sequences into 980 viral protein classes, hence enabling the identification of new viruses. These classes encompass an extensive array of viral genera and families, as well as protein functions and hosts.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordi Martorell-Marugán, Raúl López-Domínguez, Juan Antonio Villatoro-García, Daniel Toro-Domínguez, Marco Chierici, Giuseppe Jurman, Pedro Carmona-Sáez
Recent advances in single-cell RNA-Sequencing (scRNA-Seq) technologies have revolutionized our ability to gather molecular insights into different phenotypes at the level of individual cells. The analysis of the resulting data poses significant challenges, and proper statistical methods are required to analyze and extract information from scRNA-Seq datasets. Sample classification based on gene expression data has proven effective and valuable for precision medicine applications. However, standard classification schemas are often not suitable for scRNA-Seq due to their unique characteristics, and new algorithms are required to effectively analyze and classify samples at the single-cell level. Furthermore, existing methods for this purpose have limitations in their usability. Those reasons motivated us to develop singleDeep, an end-to-end pipeline that streamlines the analysis of scRNA-Seq data training deep neural networks, enabling robust prediction and characterization of sample phenotypes. We used singleDeep to make predictions on scRNA-Seq datasets from different conditions, including systemic lupus erythematosus, Alzheimer's disease and coronavirus disease 2019. Our results demonstrate strong diagnostic performance, validated both internally and externally. Moreover, singleDeep outperformed traditional machine learning methods and alternative single-cell approaches. In addition to prediction accuracy, singleDeep provides valuable insights into cell types and gene importance estimation for phenotypic characterization. This functionality provided additional and valuable information in our use cases. For instance, we corroborated that some interferon signature genes are consistently relevant for autoimmunity across all immune cell types in lupus. On the other hand, we discovered that genes linked to dementia have relevant roles in specific brain cell populations, such as APOE in astrocytes.
{"title":"Explainable deep neural networks for predicting sample phenotypes from single-cell transcriptomics.","authors":"Jordi Martorell-Marugán, Raúl López-Domínguez, Juan Antonio Villatoro-García, Daniel Toro-Domínguez, Marco Chierici, Giuseppe Jurman, Pedro Carmona-Sáez","doi":"10.1093/bib/bbae673","DOIUrl":"10.1093/bib/bbae673","url":null,"abstract":"<p><p>Recent advances in single-cell RNA-Sequencing (scRNA-Seq) technologies have revolutionized our ability to gather molecular insights into different phenotypes at the level of individual cells. The analysis of the resulting data poses significant challenges, and proper statistical methods are required to analyze and extract information from scRNA-Seq datasets. Sample classification based on gene expression data has proven effective and valuable for precision medicine applications. However, standard classification schemas are often not suitable for scRNA-Seq due to their unique characteristics, and new algorithms are required to effectively analyze and classify samples at the single-cell level. Furthermore, existing methods for this purpose have limitations in their usability. Those reasons motivated us to develop singleDeep, an end-to-end pipeline that streamlines the analysis of scRNA-Seq data training deep neural networks, enabling robust prediction and characterization of sample phenotypes. We used singleDeep to make predictions on scRNA-Seq datasets from different conditions, including systemic lupus erythematosus, Alzheimer's disease and coronavirus disease 2019. Our results demonstrate strong diagnostic performance, validated both internally and externally. Moreover, singleDeep outperformed traditional machine learning methods and alternative single-cell approaches. In addition to prediction accuracy, singleDeep provides valuable insights into cell types and gene importance estimation for phenotypic characterization. This functionality provided additional and valuable information in our use cases. For instance, we corroborated that some interferon signature genes are consistently relevant for autoimmunity across all immune cell types in lupus. On the other hand, we discovered that genes linked to dementia have relevant roles in specific brain cell populations, such as APOE in astrocytes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735047/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xikang Feng, Miaozhe Huo, He Li, Yongze Yang, Yuepeng Jiang, Liang He, Shuai Cheng Li
The complexity of T cell receptor (TCR) sequences, particularly within the complementarity-determining region 3 (CDR3), requires efficient embedding methods for applying machine learning to immunology. While various TCR CDR3 embedding strategies have been proposed, the absence of their systematic evaluations created perplexity in the community. Here, we extracted CDR3 embedding models from 19 existing methods and benchmarked these models with four curated datasets by accessing their impact on the performance of TCR downstream tasks, including TCR-epitope binding affinity prediction, epitope-specific TCR identification, TCR clustering, and visualization analysis. We assessed these models utilizing eight downstream classifiers and five downstream clustering methods, with the performance measured by a diverse range of metrics for precision, robustness, and usability. Overall, handcrafted embeddings outperformed data-driven ones in modeling TCR-epitope interactions. To further refine our comparative findings, we developed an all-in-one TCR CDR3 embedding package comprising all evaluated embedding models. This package will assist users in easily selecting suitable embedding models for their data.
{"title":"A comprehensive benchmarking for evaluating TCR embeddings in modeling TCR-epitope interactions.","authors":"Xikang Feng, Miaozhe Huo, He Li, Yongze Yang, Yuepeng Jiang, Liang He, Shuai Cheng Li","doi":"10.1093/bib/bbaf030","DOIUrl":"10.1093/bib/bbaf030","url":null,"abstract":"<p><p>The complexity of T cell receptor (TCR) sequences, particularly within the complementarity-determining region 3 (CDR3), requires efficient embedding methods for applying machine learning to immunology. While various TCR CDR3 embedding strategies have been proposed, the absence of their systematic evaluations created perplexity in the community. Here, we extracted CDR3 embedding models from 19 existing methods and benchmarked these models with four curated datasets by accessing their impact on the performance of TCR downstream tasks, including TCR-epitope binding affinity prediction, epitope-specific TCR identification, TCR clustering, and visualization analysis. We assessed these models utilizing eight downstream classifiers and five downstream clustering methods, with the performance measured by a diverse range of metrics for precision, robustness, and usability. Overall, handcrafted embeddings outperformed data-driven ones in modeling TCR-epitope interactions. To further refine our comparative findings, we developed an all-in-one TCR CDR3 embedding package comprising all evaluated embedding models. This package will assist users in easily selecting suitable embedding models for their data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11781202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143063807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sepsis, caused by infections, sparks a dangerous bodily response. The transcriptional expression patterns of host responses aid in the diagnosis of sepsis, but the challenge lies in their limited generalization capabilities. To facilitate sepsis diagnosis, we present an updated version of single-cell Pair-wise Analysis of Gene Expression (scPAGE) using transfer learning method, scPAGE2, dedicated to data fusion between single-cell and bulk transcriptome. Compared to scPAGE, the upgrade to scPAGE2 featured ameliorated Differentially Expressed Gene Pairs (DEPs) for pretraining a model in single-cell transcriptome and retrained it using bulk transcriptome data to construct a sepsis diagnostic model, which effectively transferred cell-layer information from single-cell to bulk transcriptome. Seven datasets across three transcriptome platforms and fluorescence-activated cell sorting (FACS) were used for performance validation. The model involved four DEPs, showing robust performance across next-generation sequencing and microarray platforms, surpassing state-of-the-art models with an average AUROC of 0.947 and an average AUPRC of 0.987. Analysis of scRNA-seq data reveals higher cell proportions with JAM3-PIK3AP1 expression in sepsis monocytes, decreased ARG1-CCR7 in B and T cells. Elevated IRF6-HP in sepsis monocytes confirmed by both scRNA-seq and an independent cohort using FACS. Both the superior performance of the model and the in vitro validation of IRF6-HP in monocytes emphasize that scPAGE2 is effective and robust in the construction of sepsis diagnostic model. We additionally applied scPAGE2 to acute myeloid leukemia and demonstrated its superior classification performance. Overall, we provided a strategy to improve the generalizability of classification model that can be adapted to a broad range of clinical prediction scenarios.
{"title":"PAGE-based transfer learning from single-cell to bulk sequencing enhances model generalization for sepsis diagnosis.","authors":"Nana Jin, Chuanchuan Nan, Wanyang Li, Peijing Lin, Yu Xin, Jun Wang, Yuelong Chen, Yuanhao Wang, Kaijiang Yu, Changsong Wang, Chunbo Chen, Qingshan Geng, Lixin Cheng","doi":"10.1093/bib/bbae661","DOIUrl":"https://doi.org/10.1093/bib/bbae661","url":null,"abstract":"<p><p>Sepsis, caused by infections, sparks a dangerous bodily response. The transcriptional expression patterns of host responses aid in the diagnosis of sepsis, but the challenge lies in their limited generalization capabilities. To facilitate sepsis diagnosis, we present an updated version of single-cell Pair-wise Analysis of Gene Expression (scPAGE) using transfer learning method, scPAGE2, dedicated to data fusion between single-cell and bulk transcriptome. Compared to scPAGE, the upgrade to scPAGE2 featured ameliorated Differentially Expressed Gene Pairs (DEPs) for pretraining a model in single-cell transcriptome and retrained it using bulk transcriptome data to construct a sepsis diagnostic model, which effectively transferred cell-layer information from single-cell to bulk transcriptome. Seven datasets across three transcriptome platforms and fluorescence-activated cell sorting (FACS) were used for performance validation. The model involved four DEPs, showing robust performance across next-generation sequencing and microarray platforms, surpassing state-of-the-art models with an average AUROC of 0.947 and an average AUPRC of 0.987. Analysis of scRNA-seq data reveals higher cell proportions with JAM3-PIK3AP1 expression in sepsis monocytes, decreased ARG1-CCR7 in B and T cells. Elevated IRF6-HP in sepsis monocytes confirmed by both scRNA-seq and an independent cohort using FACS. Both the superior performance of the model and the in vitro validation of IRF6-HP in monocytes emphasize that scPAGE2 is effective and robust in the construction of sepsis diagnostic model. We additionally applied scPAGE2 to acute myeloid leukemia and demonstrated its superior classification performance. Overall, we provided a strategy to improve the generalizability of classification model that can be adapted to a broad range of clinical prediction scenarios.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The identification of cancer driver genes is crucial for understanding the complex processes involved in cancer development, progression, and therapeutic strategies. Multi-omics data and biological networks provided by numerous databases enable the application of graph deep learning techniques that incorporate network structures into the deep learning framework. However, most existing methods do not account for the heterophily in the biological networks, which hinders the improvement of model performance. Meanwhile, feature confusion often arises in models based on graph neural networks in such graphs. To address this, we propose a Simplified Graph neural network for identifying Cancer Driver genes in heterophilic networks (SGCD), which comprises primarily two components: a graph convolutional neural network with representation separation and a bimodal feature extractor. The results demonstrate that SGCD not only performs exceptionally well but also exhibits robust discriminative capabilities compared to state-of-the-art methods across all benchmark datasets. Moreover, subsequent interpretability experiments on both the model and biological aspects provide compelling evidence supporting the reliability of SGCD. Additionally, the model can dissect gene modules, revealing clearer connections between driver genes in cancers. We are confident that SGCD holds potential in the field of precision oncology and may be applied to prognosticate biomarkers for a wide range of complex diseases.
{"title":"Towards simplified graph neural networks for identifying cancer driver genes in heterophilic networks.","authors":"Xingyi Li, Jialuo Xu, Junming Li, Jia Gu, Xuequn Shang","doi":"10.1093/bib/bbae691","DOIUrl":"10.1093/bib/bbae691","url":null,"abstract":"<p><p>The identification of cancer driver genes is crucial for understanding the complex processes involved in cancer development, progression, and therapeutic strategies. Multi-omics data and biological networks provided by numerous databases enable the application of graph deep learning techniques that incorporate network structures into the deep learning framework. However, most existing methods do not account for the heterophily in the biological networks, which hinders the improvement of model performance. Meanwhile, feature confusion often arises in models based on graph neural networks in such graphs. To address this, we propose a Simplified Graph neural network for identifying Cancer Driver genes in heterophilic networks (SGCD), which comprises primarily two components: a graph convolutional neural network with representation separation and a bimodal feature extractor. The results demonstrate that SGCD not only performs exceptionally well but also exhibits robust discriminative capabilities compared to state-of-the-art methods across all benchmark datasets. Moreover, subsequent interpretability experiments on both the model and biological aspects provide compelling evidence supporting the reliability of SGCD. Additionally, the model can dissect gene modules, revealing clearer connections between driver genes in cancers. We are confident that SGCD holds potential in the field of precision oncology and may be applied to prognosticate biomarkers for a wide range of complex diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11697181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142919593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human-machine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instruction-based dataset called 'MolOpt-Instructions' for fine-tuning language models on molecule optimization tasks. We have made our code and data publicly available at https://github.com/blazerye/DrugAssist, which we hope to pave the way for future research in LLMs' application for drug discovery.
{"title":"DrugAssist: a large language model for molecule optimization.","authors":"Geyan Ye, Xibao Cai, Houtim Lai, Xing Wang, Junhong Huang, Longyue Wang, Wei Liu, Xiangxiang Zeng","doi":"10.1093/bib/bbae693","DOIUrl":"10.1093/bib/bbae693","url":null,"abstract":"<p><p>Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human-machine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instruction-based dataset called 'MolOpt-Instructions' for fine-tuning language models on molecule optimization tasks. We have made our code and data publicly available at https://github.com/blazerye/DrugAssist, which we hope to pave the way for future research in LLMs' application for drug discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11697106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Si Zheng, Yaowen Gu, Yuzhen Gu, Yelin Zhao, Liang Li, Min Wang, Rui Jiang, Xia Yu, Ting Chen, Jiao Li
Drug resistance in Mycobacterium tuberculosis (Mtb) is a significant challenge in the control and treatment of tuberculosis, making efforts to combat the spread of this global health burden more difficult. To accelerate anti-tuberculosis drug discovery, repurposing clinically approved or investigational drugs for the treatment of tuberculosis by computational methods has become an attractive strategy. In this study, we developed a virtual screening workflow that combines multiple machine learning and deep learning models, and 11 576 compounds extracted from the DrugBank database were screened against Mtb. Our screening method produced satisfactory predictions on three data-splitting settings, with the top predicted bioactive compounds all known antibacterial or anti-TB drugs. To further identify and evaluate drugs with repurposing potential in TB therapy, 15 screened potential compounds were selected for subsequent computational and experimental evaluations, out of which aldoxorubicin and quarfloxin showed potent inhibition of Mtb strain H37Rv, with minimal inhibitory concentrations of 4.16 and 20.67 μM/mL, respectively. More inspiringly, these two compounds also showed antibacterial activity against multidrug-resistant TB isolates and exhibited strong antimicrobial activity against Mtb. Furthermore, molecular docking, molecular dynamics simulation, and the surface plasmon resonance experiments validated the direct binding of the two compounds to Mtb DNA gyrase. In summary, our effective comprehensive virtual screening workflow successfully repurposed two novel drugs (aldoxorubicin and quarfloxin) as promising anti-Mtb candidates. The verification results provide useful information for the further development and clinical verification of anti-TB drugs.
{"title":"Machine learning-enabled virtual screening indicates the anti-tuberculosis activity of aldoxorubicin and quarfloxin with verification by molecular docking, molecular dynamics simulations, and biological evaluations.","authors":"Si Zheng, Yaowen Gu, Yuzhen Gu, Yelin Zhao, Liang Li, Min Wang, Rui Jiang, Xia Yu, Ting Chen, Jiao Li","doi":"10.1093/bib/bbae696","DOIUrl":"10.1093/bib/bbae696","url":null,"abstract":"<p><p>Drug resistance in Mycobacterium tuberculosis (Mtb) is a significant challenge in the control and treatment of tuberculosis, making efforts to combat the spread of this global health burden more difficult. To accelerate anti-tuberculosis drug discovery, repurposing clinically approved or investigational drugs for the treatment of tuberculosis by computational methods has become an attractive strategy. In this study, we developed a virtual screening workflow that combines multiple machine learning and deep learning models, and 11 576 compounds extracted from the DrugBank database were screened against Mtb. Our screening method produced satisfactory predictions on three data-splitting settings, with the top predicted bioactive compounds all known antibacterial or anti-TB drugs. To further identify and evaluate drugs with repurposing potential in TB therapy, 15 screened potential compounds were selected for subsequent computational and experimental evaluations, out of which aldoxorubicin and quarfloxin showed potent inhibition of Mtb strain H37Rv, with minimal inhibitory concentrations of 4.16 and 20.67 μM/mL, respectively. More inspiringly, these two compounds also showed antibacterial activity against multidrug-resistant TB isolates and exhibited strong antimicrobial activity against Mtb. Furthermore, molecular docking, molecular dynamics simulation, and the surface plasmon resonance experiments validated the direct binding of the two compounds to Mtb DNA gyrase. In summary, our effective comprehensive virtual screening workflow successfully repurposed two novel drugs (aldoxorubicin and quarfloxin) as promising anti-Mtb candidates. The verification results provide useful information for the further development and clinical verification of anti-TB drugs.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684895/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate and rapid taxonomic classifications are essential for systematically exploring organisms and metabolites in diverse environments. Many tools have been developed for biological taxonomic trees, but limitations apply, and a streamlined method for constructing chemical taxonomic trees is notably absent. We present the iPhylo suite (https://www.iphylo.net/), a comprehensive, automated, and interactive platform for biological and chemical taxonomic analysis. The iPhylo suite features web-based modules for the interactive construction and annotation of taxonomic trees and a stand-alone command-line interface (CLI) for local operation or deployment on high-performance computing (HPC) clusters. iPhylo supports National Center for Biotechnology Information (NCBI) taxonomy for biologicals and ChemOnt and NPClassifier for chemical classifications. The iPhylo visualization module, fully implemented in R, allows users to save progress locally and customize the underlying R code. Finally, the CLI module facilitates analysis across all hierarchical relational databases. We showcase the iPhylo suite's capabilities for visualizing environmental microbiomes, analyzing gut microbial metabolite synthesis preferences, and discovering novel correlations between microbiome and metabolome in humans and environment. Overall, the iPhylo suite is distinguished by its unified and interactive framework for in-depth taxonomic and integrative analyses of biological and chemical features and beyond.
{"title":"The iPhylo suite: an interactive platform for building and annotating biological and chemical taxonomic trees.","authors":"Yueer Li, Chen Peng, Fei Chi, Zinuo Huang, Mengyi Yuan, Xin Zhou, Chao Jiang","doi":"10.1093/bib/bbae679","DOIUrl":"10.1093/bib/bbae679","url":null,"abstract":"<p><p>Accurate and rapid taxonomic classifications are essential for systematically exploring organisms and metabolites in diverse environments. Many tools have been developed for biological taxonomic trees, but limitations apply, and a streamlined method for constructing chemical taxonomic trees is notably absent. We present the iPhylo suite (https://www.iphylo.net/), a comprehensive, automated, and interactive platform for biological and chemical taxonomic analysis. The iPhylo suite features web-based modules for the interactive construction and annotation of taxonomic trees and a stand-alone command-line interface (CLI) for local operation or deployment on high-performance computing (HPC) clusters. iPhylo supports National Center for Biotechnology Information (NCBI) taxonomy for biologicals and ChemOnt and NPClassifier for chemical classifications. The iPhylo visualization module, fully implemented in R, allows users to save progress locally and customize the underlying R code. Finally, the CLI module facilitates analysis across all hierarchical relational databases. We showcase the iPhylo suite's capabilities for visualizing environmental microbiomes, analyzing gut microbial metabolite synthesis preferences, and discovering novel correlations between microbiome and metabolome in humans and environment. Overall, the iPhylo suite is distinguished by its unified and interactive framework for in-depth taxonomic and integrative analyses of biological and chemical features and beyond.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684897/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The spatial transcriptomics is a rapidly evolving biological technology that simultaneously measures the gene expression profiles and the spatial locations of spots. With progressive advances, current spatial transcriptomic techniques can achieve the cellular or even the subcellular resolution, making it possible to explore the fine-grained spatial pattern of cell types within one tissue section. However, most existing cell spatial clustering methods require a correct specification of the cell type number, which is hard to determine in the practical exploratory data analysis. To address this issue, we present a nonparametric Bayesian model BACT to perform BAyesian Cell Typing by utilizing gene expression information and spatial coordinates of cells. BACT incorporates a nonparametric Potts prior to induce neighboring cells' spatial dependency, and, more importantly, it can automatically learn the cell type number directly from the data without prespecification. Evaluations on three single-cell spatial transcriptomic datasets demonstrate the better performance of BACT than competing spatial cell typing methods. The R package and the user manual of BACT are publicly available at https://github.com/yinqiaoyan/BACT.
{"title":"BACT: nonparametric Bayesian cell typing for single-cell spatial transcriptomics data.","authors":"Yinqiao Yan, Xiangyu Luo","doi":"10.1093/bib/bbae689","DOIUrl":"10.1093/bib/bbae689","url":null,"abstract":"<p><p>The spatial transcriptomics is a rapidly evolving biological technology that simultaneously measures the gene expression profiles and the spatial locations of spots. With progressive advances, current spatial transcriptomic techniques can achieve the cellular or even the subcellular resolution, making it possible to explore the fine-grained spatial pattern of cell types within one tissue section. However, most existing cell spatial clustering methods require a correct specification of the cell type number, which is hard to determine in the practical exploratory data analysis. To address this issue, we present a nonparametric Bayesian model BACT to perform BAyesian Cell Typing by utilizing gene expression information and spatial coordinates of cells. BACT incorporates a nonparametric Potts prior to induce neighboring cells' spatial dependency, and, more importantly, it can automatically learn the cell type number directly from the data without prespecification. Evaluations on three single-cell spatial transcriptomic datasets demonstrate the better performance of BACT than competing spatial cell typing methods. The R package and the user manual of BACT are publicly available at https://github.com/yinqiaoyan/BACT.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11697130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The role of automation in enhancing reproducibility and interoperability of PBPK models.","authors":"Abdallah Derbalah, Masoud Jamei, Iain Gardner, Armin Sepp","doi":"10.1093/bib/bbaf053","DOIUrl":"10.1093/bib/bbaf053","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11808803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143381594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}