Pub Date : 2025-01-27DOI: 10.1186/s12859-025-06057-9
Qifeng Liu, Tao Zhou, Chi Cheng, Jin Ma, Marzia Hoque Tania
Background: Due to the complexity and cost of preparing histopathological slides, deep learning-based methods have been developed to generate high-quality histological images. However, existing approaches primarily focus on spatial domain information, neglecting the periodic information in the frequency domain and the complementary relationship between the two domains. In this paper, we proposed a generative adversarial network that employs a cross-attention mechanism to extract and fuse features across spatial and frequency domains. The method optimizes frequency domain features using spatial domain guidance and refines spatial features with frequency domain information, preserving key details while eliminating redundancy to generate high-quality histological images.
Results: Our model incorporates a variable-window mixed attention module to dynamically adjust attention window sizes, capturing both local details and global context. A spectral filtering module enhances the extraction of repetitive textures and periodic structures, while a cross-attention fusion module dynamically weights features from both domains, focusing on the most critical information to produce realistic and detailed images.
Conclusions: The proposed method achieves efficient spatial-frequency domain fusion, significantly improving image generation quality. Experiments on the Patch Camelyon dataset show superior performance over eight state-of-the-art models across five metrics. This approach advances automated histopathological image generation with potential for clinical applications.
{"title":"Hybrid generative adversarial network based on frequency and spatial domain for histopathological image synthesis.","authors":"Qifeng Liu, Tao Zhou, Chi Cheng, Jin Ma, Marzia Hoque Tania","doi":"10.1186/s12859-025-06057-9","DOIUrl":"https://doi.org/10.1186/s12859-025-06057-9","url":null,"abstract":"<p><strong>Background: </strong>Due to the complexity and cost of preparing histopathological slides, deep learning-based methods have been developed to generate high-quality histological images. However, existing approaches primarily focus on spatial domain information, neglecting the periodic information in the frequency domain and the complementary relationship between the two domains. In this paper, we proposed a generative adversarial network that employs a cross-attention mechanism to extract and fuse features across spatial and frequency domains. The method optimizes frequency domain features using spatial domain guidance and refines spatial features with frequency domain information, preserving key details while eliminating redundancy to generate high-quality histological images.</p><p><strong>Results: </strong>Our model incorporates a variable-window mixed attention module to dynamically adjust attention window sizes, capturing both local details and global context. A spectral filtering module enhances the extraction of repetitive textures and periodic structures, while a cross-attention fusion module dynamically weights features from both domains, focusing on the most critical information to produce realistic and detailed images.</p><p><strong>Conclusions: </strong>The proposed method achieves efficient spatial-frequency domain fusion, significantly improving image generation quality. Experiments on the Patch Camelyon dataset show superior performance over eight state-of-the-art models across five metrics. This approach advances automated histopathological image generation with potential for clinical applications.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"29"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143051354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-25DOI: 10.1186/s12859-025-06052-0
Jinchen Sun, Haoran Zheng
Background: Drug-drug interactions (DDIs) especially antagonistic ones present significant risks to patient safety, underscoring the urgent need for reliable prediction methods. Recently, substructure-based DDI prediction has garnered much attention due to the dominant influence of functional groups and substructures on drug properties. However, existing approaches face challenges regarding the insufficient interpretability of identified substructures and the isolation of chemical substructures.
Results: This study introduces a novel framework for DDI prediction termed HDN-DDI. HDN-DDI integrates an explainable substructure extraction module to decompose drug molecules and represents them using innovative hierarchical molecular graphs, which effectively incorporates information from real chemical substructures and improves molecules encoding efficiency. Furthermore, the enhanced dual-view learning method inspired by the underlying mechanisms of DDIs enables HDN-DDI to comprehensively capture both hierarchical structure and interaction information. Experimental results demonstrate that HDN-DDI has achieved state-of-the-art performance with accuracies of 97.90% and 99.38% on the two widely-used datasets in the warm-start setting. Moreover, HDN-DDI exhibits substantial improvements in the cold-start setting with boosts of 4.96% in accuracy and 7.08% in F1 score on previously unseen drugs. Real-world applications further highlight HDN-DDI's robust generalization capabilities towards newly approved drugs.
Conclusion: With its accurate predictions and robust generalization across different settings, HDN-DDI shows promise for enhancing drug safety and efficacy. Future research will focus on refining decomposition rules as well as integrating external knowledge while preserving the model's generalization capabilities.
{"title":"HDN-DDI: a novel framework for predicting drug-drug interactions using hierarchical molecular graphs and enhanced dual-view representation learning.","authors":"Jinchen Sun, Haoran Zheng","doi":"10.1186/s12859-025-06052-0","DOIUrl":"10.1186/s12859-025-06052-0","url":null,"abstract":"<p><strong>Background: </strong>Drug-drug interactions (DDIs) especially antagonistic ones present significant risks to patient safety, underscoring the urgent need for reliable prediction methods. Recently, substructure-based DDI prediction has garnered much attention due to the dominant influence of functional groups and substructures on drug properties. However, existing approaches face challenges regarding the insufficient interpretability of identified substructures and the isolation of chemical substructures.</p><p><strong>Results: </strong>This study introduces a novel framework for DDI prediction termed HDN-DDI. HDN-DDI integrates an explainable substructure extraction module to decompose drug molecules and represents them using innovative hierarchical molecular graphs, which effectively incorporates information from real chemical substructures and improves molecules encoding efficiency. Furthermore, the enhanced dual-view learning method inspired by the underlying mechanisms of DDIs enables HDN-DDI to comprehensively capture both hierarchical structure and interaction information. Experimental results demonstrate that HDN-DDI has achieved state-of-the-art performance with accuracies of 97.90% and 99.38% on the two widely-used datasets in the warm-start setting. Moreover, HDN-DDI exhibits substantial improvements in the cold-start setting with boosts of 4.96% in accuracy and 7.08% in F1 score on previously unseen drugs. Real-world applications further highlight HDN-DDI's robust generalization capabilities towards newly approved drugs.</p><p><strong>Conclusion: </strong>With its accurate predictions and robust generalization across different settings, HDN-DDI shows promise for enhancing drug safety and efficacy. Future research will focus on refining decomposition rules as well as integrating external knowledge while preserving the model's generalization capabilities.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"28"},"PeriodicalIF":2.9,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11765940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143036651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1186/s12859-025-06049-9
Tatiana A Semashko, Gleb Y Fisunov, Georgiy Y Shevelev, Vadim M Govorun
Background: Currently, synthetic genomics is a rapidly developing field. Its main tasks, such as the design of synthetic sequences and the assembly of DNA sequences from synthetic oligonucleotides, require specialized software. In this article, we present a program with a graphical interface that allows non-bioinformatics to perform the tasks needed in synthetic genomics.
Results: We developed BAC-browser v.2.1. It helps to design nucleotide sequences and features the following tools: generate nucleotide sequence from amino acid sequences using a codon frequency table for a specific organism, as well as visualization of restriction sites, GC composition, GC skew and secondary structure. To assemble DNA sequences, a fragmentation tool was created: regular breakdown into oligonucleotides of a certain length and breakdown into oligonucleotides with thermodynamic alignment. We demonstrate the possibility of DNA fragments assemblies designed in different modes of BAC-browser.
Conclusions: The BAC-browser has a large number of tools for working in the field of systemic genomics and is freely available at the link with a direct link https://sysbiomed.ru/upload/BAC-browser-2.1.zip .
{"title":"BAC-browser: the tool for synthetic biology.","authors":"Tatiana A Semashko, Gleb Y Fisunov, Georgiy Y Shevelev, Vadim M Govorun","doi":"10.1186/s12859-025-06049-9","DOIUrl":"10.1186/s12859-025-06049-9","url":null,"abstract":"<p><strong>Background: </strong>Currently, synthetic genomics is a rapidly developing field. Its main tasks, such as the design of synthetic sequences and the assembly of DNA sequences from synthetic oligonucleotides, require specialized software. In this article, we present a program with a graphical interface that allows non-bioinformatics to perform the tasks needed in synthetic genomics.</p><p><strong>Results: </strong>We developed BAC-browser v.2.1. It helps to design nucleotide sequences and features the following tools: generate nucleotide sequence from amino acid sequences using a codon frequency table for a specific organism, as well as visualization of restriction sites, GC composition, GC skew and secondary structure. To assemble DNA sequences, a fragmentation tool was created: regular breakdown into oligonucleotides of a certain length and breakdown into oligonucleotides with thermodynamic alignment. We demonstrate the possibility of DNA fragments assemblies designed in different modes of BAC-browser.</p><p><strong>Conclusions: </strong>The BAC-browser has a large number of tools for working in the field of systemic genomics and is freely available at the link with a direct link https://sysbiomed.ru/upload/BAC-browser-2.1.zip .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"27"},"PeriodicalIF":2.9,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758742/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-22DOI: 10.1186/s12859-024-06026-8
Clémence Réda, Jill-Jênn Vie, Olaf Wolkenhauer
Background: Interpretability is a topical question in recommender systems, especially in healthcare applications. An interpretable classifier quantifies the importance of each input feature for the predicted item-user association in a non-ambiguous fashion.
Results: We introduce the novel Joint Embedding Learning-classifier for improved Interpretability (JELI). By combining the training of a structured collaborative-filtering classifier and an embedding learning task, JELI predicts new user-item associations based on jointly learned item and user embeddings while providing feature-wise importance scores. Therefore, JELI flexibly allows the introduction of priors on the connections between users, items, and features. In particular, JELI simultaneously (a) learns feature, item, and user embeddings; (b) predicts new item-user associations; (c) provides importance scores for each feature. Moreover, JELI instantiates a generic approach to training recommender systems by encoding generic graph-regularization constraints.
Conclusions: First, we show that the joint training approach yields a gain in the predictive power of the downstream classifier. Second, JELI can recover feature-association dependencies. Finally, JELI induces a restriction in the number of parameters compared to baselines in synthetic and drug-repurposing data sets.
{"title":"Joint embedding-classifier learning for interpretable collaborative filtering.","authors":"Clémence Réda, Jill-Jênn Vie, Olaf Wolkenhauer","doi":"10.1186/s12859-024-06026-8","DOIUrl":"10.1186/s12859-024-06026-8","url":null,"abstract":"<p><strong>Background: </strong>Interpretability is a topical question in recommender systems, especially in healthcare applications. An interpretable classifier quantifies the importance of each input feature for the predicted item-user association in a non-ambiguous fashion.</p><p><strong>Results: </strong>We introduce the novel Joint Embedding Learning-classifier for improved Interpretability (JELI). By combining the training of a structured collaborative-filtering classifier and an embedding learning task, JELI predicts new user-item associations based on jointly learned item and user embeddings while providing feature-wise importance scores. Therefore, JELI flexibly allows the introduction of priors on the connections between users, items, and features. In particular, JELI simultaneously (a) learns feature, item, and user embeddings; (b) predicts new item-user associations; (c) provides importance scores for each feature. Moreover, JELI instantiates a generic approach to training recommender systems by encoding generic graph-regularization constraints.</p><p><strong>Conclusions: </strong>First, we show that the joint training approach yields a gain in the predictive power of the downstream classifier. Second, JELI can recover feature-association dependencies. Finally, JELI induces a restriction in the number of parameters compared to baselines in synthetic and drug-repurposing data sets.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"26"},"PeriodicalIF":2.9,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143021825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes.
Results: In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications.
Conclusions: We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.
{"title":"A comprehensive survey of scoring functions for protein docking models.","authors":"Azam Shirali, Vitalii Stebliankin, Ukesh Karki, Jimeng Shi, Prem Chapagain, Giri Narasimhan","doi":"10.1186/s12859-024-05991-4","DOIUrl":"10.1186/s12859-024-05991-4","url":null,"abstract":"<p><strong>Background: </strong>While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes.</p><p><strong>Results: </strong>In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications.</p><p><strong>Conclusions: </strong>We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"25"},"PeriodicalIF":2.9,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143021679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21DOI: 10.1186/s12859-025-06039-x
Stefan Pastore, Philipp Hillenbrand, Nils Molnar, Irina Kovlyagina, Monika Chanu Chongtham, Stanislav Sys, Beat Lutz, Margarita Tevosian, Susanne Gerber
Background: Tissue clearing combined with light-sheet microscopy is gaining popularity among neuroscientists interested in unbiased assessment of their samples in 3D volume. However, the analysis of such data remains a challenge. ClearMap and CellFinder are tools for analyzing neuronal activity maps in an intact volume of cleared mouse brains. However, these tools lack a user interface, restricting accessibility primarily to scientists proficient in advanced Python programming. The application presented here aims to bridge this gap and make data analysis accessible to a wider scientific community.
Results: We developed an easy-to-adopt graphical user interface for cell quantification and group analysis of whole cleared adult mouse brains. Fundamental statistical analysis, such as PCA and box plots, and additional visualization features allow for quick data evaluation and quality checks. Furthermore, we present a use case of ClearFinder GUI for cross-analyzing the same samples with two cell counting tools, highlighting the discrepancies in cell detection efficiency between them.
Conclusions: Our easily accessible tool allows more researchers to implement the methodology, troubleshoot arising issues, and develop quality checks, benchmarking, and standardized analysis pipelines for cell detection and region annotation in whole volumes of cleared brains.
{"title":"ClearFinder: a Python GUI for annotating cells in cleared mouse brain.","authors":"Stefan Pastore, Philipp Hillenbrand, Nils Molnar, Irina Kovlyagina, Monika Chanu Chongtham, Stanislav Sys, Beat Lutz, Margarita Tevosian, Susanne Gerber","doi":"10.1186/s12859-025-06039-x","DOIUrl":"10.1186/s12859-025-06039-x","url":null,"abstract":"<p><strong>Background: </strong>Tissue clearing combined with light-sheet microscopy is gaining popularity among neuroscientists interested in unbiased assessment of their samples in 3D volume. However, the analysis of such data remains a challenge. ClearMap and CellFinder are tools for analyzing neuronal activity maps in an intact volume of cleared mouse brains. However, these tools lack a user interface, restricting accessibility primarily to scientists proficient in advanced Python programming. The application presented here aims to bridge this gap and make data analysis accessible to a wider scientific community.</p><p><strong>Results: </strong>We developed an easy-to-adopt graphical user interface for cell quantification and group analysis of whole cleared adult mouse brains. Fundamental statistical analysis, such as PCA and box plots, and additional visualization features allow for quick data evaluation and quality checks. Furthermore, we present a use case of ClearFinder GUI for cross-analyzing the same samples with two cell counting tools, highlighting the discrepancies in cell detection efficiency between them.</p><p><strong>Conclusions: </strong>Our easily accessible tool allows more researchers to implement the methodology, troubleshoot arising issues, and develop quality checks, benchmarking, and standardized analysis pipelines for cell detection and region annotation in whole volumes of cleared brains.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"24"},"PeriodicalIF":2.9,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21DOI: 10.1186/s12859-024-06015-x
Xiao Zhang, Qian Liu
Background: Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.
Results: Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.
Conclusion: The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.
{"title":"A graph neural network approach for hierarchical mapping of breast cancer protein communities.","authors":"Xiao Zhang, Qian Liu","doi":"10.1186/s12859-024-06015-x","DOIUrl":"10.1186/s12859-024-06015-x","url":null,"abstract":"<p><strong>Background: </strong>Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.</p><p><strong>Results: </strong>Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.</p><p><strong>Conclusion: </strong>The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"23"},"PeriodicalIF":2.9,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11749236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1186/s12859-025-06044-0
Jinming Cheng, Xinyi Jin, Gordon K Smyth, Yunshun Chen
Background: Imaging-based spatial transcriptomics technologies allow us to explore spatial gene expression profiles at the cellular level. Cell type annotation of imaging-based spatial data is challenging due to the small gene panel, but it is a crucial step for downstream analyses. Many good reference-based cell type annotation tools have been developed for single-cell RNA sequencing and sequencing-based spatial transcriptomics data. However, the performance of the reference-based cell type annotation tools on imaging-based spatial transcriptomics data has not been well studied yet.
Results: We compared performance of five reference-based methods (SingleR, Azimuth, RCTD, scPred and scmapCell) with the marker-gene-based manual annotation method on an imaging-based Xenium data of human breast cancer. A practical workflow has been demonstrated for preparing a high-quality single-cell RNA reference, evaluating the accuracy, and estimating the running time for reference-based cell type annotation tools.
Conclusions: SingleR was the best performing reference-based cell type annotation tool for the Xenium platform, being fast, accurate and easy to use, with results closely matching those of manual annotation.
{"title":"Benchmarking cell type annotation methods for 10x Xenium spatial transcriptomics data.","authors":"Jinming Cheng, Xinyi Jin, Gordon K Smyth, Yunshun Chen","doi":"10.1186/s12859-025-06044-0","DOIUrl":"10.1186/s12859-025-06044-0","url":null,"abstract":"<p><strong>Background: </strong>Imaging-based spatial transcriptomics technologies allow us to explore spatial gene expression profiles at the cellular level. Cell type annotation of imaging-based spatial data is challenging due to the small gene panel, but it is a crucial step for downstream analyses. Many good reference-based cell type annotation tools have been developed for single-cell RNA sequencing and sequencing-based spatial transcriptomics data. However, the performance of the reference-based cell type annotation tools on imaging-based spatial transcriptomics data has not been well studied yet.</p><p><strong>Results: </strong>We compared performance of five reference-based methods (SingleR, Azimuth, RCTD, scPred and scmapCell) with the marker-gene-based manual annotation method on an imaging-based Xenium data of human breast cancer. A practical workflow has been demonstrated for preparing a high-quality single-cell RNA reference, evaluating the accuracy, and estimating the running time for reference-based cell type annotation tools.</p><p><strong>Conclusions: </strong>SingleR was the best performing reference-based cell type annotation tool for the Xenium platform, being fast, accurate and easy to use, with results closely matching those of manual annotation.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"22"},"PeriodicalIF":2.9,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11744978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-18DOI: 10.1186/s12859-025-06035-1
T J Booth, S Shaw, P Cruz-Morales, T Weber
Background: The increasing amount of genomic data calls for tools that can create genome-scale phylogenies quickly and efficiently. Existing tools rely on large reference databases or require lengthy de novo calculations to identify orthologues, meaning that they have long run times and are limited in their taxonomic scope. To address this, we created getphylo, a python tool for the rapid generation of phylogenetic trees de novo from annotated sequences.
Results: We present getphylo (Genbank to Phylogeny), a tool that automatically builds phylogenetic trees from annotated genomes alone. Orthologues are identified heuristically by searching for singletons (single copy genes) across all input genomes and the phylogeny is inferred from a concatenated alignment of all coding sequences by maximum likelihood. We performed a thorough benchmarking of getphylo against two existing tools, autoMLST and GTDB-tk, to show that it can produce trees of comparable quality in a fraction of the time. We also demonstrate the flexibility of getphylo across four case studies including bacterial and eukaryotic genomes, and biosynthetic gene clusters.
Conclusions: getphylo is a quick and reliable tool for the automated generation of genome-scale phylogenetic trees. getphylo can produce phylogenies comparable to other software in a fraction of the time, without the need large local databases or intense computation. getphylo can rapidly identify orthologues from a wide variety of datasets regardless of taxonomic or genomic scope. The usability, speed, flexibility of getphylo makes it a valuable addition to the phylogenetics toolkit.
背景:越来越多的基因组数据需要能够快速有效地创建基因组级系统发育的工具。现有的工具依赖于大型参考数据库,或者需要冗长的从头计算来识别同源物,这意味着它们运行时间长,分类范围有限。为了解决这个问题,我们创建了getphylo,这是一个python工具,用于从注释序列从头快速生成系统发育树。结果:我们提出了getphylo (Genbank to Phylogeny),这是一个自动从注释基因组构建系统发生树的工具。通过在所有输入基因组中搜索单基因(单拷贝基因),启发式地识别同源物,并通过最大似然法从所有编码序列的串联比对中推断出系统发育。我们针对两个现有工具(autoMLST和GTDB-tk)对getphylo进行了全面的基准测试,以表明它可以在很短的时间内生成具有相当质量的树。我们还展示了getphylo在四个案例研究中的灵活性,包括细菌和真核生物基因组,以及生物合成基因簇。结论:getphylo是一种快速、可靠的自动生成基因组级系统发育树的工具。Getphylo可以在很短的时间内生成与其他软件相当的系统发育,而不需要大型本地数据库或密集的计算。Getphylo可以快速识别同源物从各种各样的数据集,无论分类学或基因组范围。getphylo的可用性、速度和灵活性使其成为系统遗传学工具包的一个有价值的补充。
{"title":"getphylo: rapid and automatic generation of multi-locus phylogenetic trees.","authors":"T J Booth, S Shaw, P Cruz-Morales, T Weber","doi":"10.1186/s12859-025-06035-1","DOIUrl":"10.1186/s12859-025-06035-1","url":null,"abstract":"<p><strong>Background: </strong>The increasing amount of genomic data calls for tools that can create genome-scale phylogenies quickly and efficiently. Existing tools rely on large reference databases or require lengthy de novo calculations to identify orthologues, meaning that they have long run times and are limited in their taxonomic scope. To address this, we created getphylo, a python tool for the rapid generation of phylogenetic trees de novo from annotated sequences.</p><p><strong>Results: </strong>We present getphylo (Genbank to Phylogeny), a tool that automatically builds phylogenetic trees from annotated genomes alone. Orthologues are identified heuristically by searching for singletons (single copy genes) across all input genomes and the phylogeny is inferred from a concatenated alignment of all coding sequences by maximum likelihood. We performed a thorough benchmarking of getphylo against two existing tools, autoMLST and GTDB-tk, to show that it can produce trees of comparable quality in a fraction of the time. We also demonstrate the flexibility of getphylo across four case studies including bacterial and eukaryotic genomes, and biosynthetic gene clusters.</p><p><strong>Conclusions: </strong>getphylo is a quick and reliable tool for the automated generation of genome-scale phylogenetic trees. getphylo can produce phylogenies comparable to other software in a fraction of the time, without the need large local databases or intense computation. getphylo can rapidly identify orthologues from a wide variety of datasets regardless of taxonomic or genomic scope. The usability, speed, flexibility of getphylo makes it a valuable addition to the phylogenetics toolkit.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"21"},"PeriodicalIF":2.9,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748604/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1186/s12859-024-05987-0
Dhekra Saeed, Huanlai Xing, Barakat AlBadani, Li Feng, Raeed Al-Sabri, Monir Abdullah, Amir Rehman
Background: Drug response prediction is critical in precision medicine to determine the most effective and safe treatments for individual patients. Traditional prediction methods relying on demographic and genetic data often fall short in accuracy and robustness. Recent graph-based models, while promising, frequently neglect the critical role of atomic interactions and fail to integrate drug fingerprints with SMILES for comprehensive molecular graph construction.
Results: We introduce multimodal multi-channel graph attention network with adaptive fusion (MGATAF), a framework designed to enhance drug response predictions by capturing both local and global interactions among graph nodes. MGATAF improves drug representation by integrating SMILES and fingerprints, resulting in more precise predictions of drug effects. The methodology involves constructing multimodal molecular graphs, employing multi-channel graph attention networks to capture diverse interactions, and using adaptive fusion to integrate these interactions at multiple abstraction levels. Empirical results demonstrate MGATAF's superior performance compared to traditional and other graph-based techniques. For example, on the GDSC dataset, MGATAF achieved a 5.12% improvement in the Pearson correlation coefficient (PCC), reaching 0.9312 with an RMSE of 0.0225. Similarly, in new cell-line tests, MGATAF outperformed baselines with a PCC of 0.8536 and an RMSE of 0.0321 on the GDSC dataset, and a PCC of 0.7364 with an RMSE of 0.0531 on the CCLE dataset.
Conclusions: MGATAF significantly advances drug response prediction by effectively integrating multiple molecular data types and capturing complex interactions. This framework enhances prediction accuracy and offers a robust tool for personalized medicine, potentially leading to more effective and safer treatments for patients. Future research can expand on this work by exploring additional data modalities and refining the adaptive fusion mechanisms.
{"title":"MGATAF: multi-channel graph attention network with adaptive fusion for cancer-drug response prediction.","authors":"Dhekra Saeed, Huanlai Xing, Barakat AlBadani, Li Feng, Raeed Al-Sabri, Monir Abdullah, Amir Rehman","doi":"10.1186/s12859-024-05987-0","DOIUrl":"10.1186/s12859-024-05987-0","url":null,"abstract":"<p><strong>Background: </strong>Drug response prediction is critical in precision medicine to determine the most effective and safe treatments for individual patients. Traditional prediction methods relying on demographic and genetic data often fall short in accuracy and robustness. Recent graph-based models, while promising, frequently neglect the critical role of atomic interactions and fail to integrate drug fingerprints with SMILES for comprehensive molecular graph construction.</p><p><strong>Results: </strong>We introduce multimodal multi-channel graph attention network with adaptive fusion (MGATAF), a framework designed to enhance drug response predictions by capturing both local and global interactions among graph nodes. MGATAF improves drug representation by integrating SMILES and fingerprints, resulting in more precise predictions of drug effects. The methodology involves constructing multimodal molecular graphs, employing multi-channel graph attention networks to capture diverse interactions, and using adaptive fusion to integrate these interactions at multiple abstraction levels. Empirical results demonstrate MGATAF's superior performance compared to traditional and other graph-based techniques. For example, on the GDSC dataset, MGATAF achieved a 5.12% improvement in the Pearson correlation coefficient (PCC), reaching 0.9312 with an RMSE of 0.0225. Similarly, in new cell-line tests, MGATAF outperformed baselines with a PCC of 0.8536 and an RMSE of 0.0321 on the GDSC dataset, and a PCC of 0.7364 with an RMSE of 0.0531 on the CCLE dataset.</p><p><strong>Conclusions: </strong>MGATAF significantly advances drug response prediction by effectively integrating multiple molecular data types and capturing complex interactions. This framework enhances prediction accuracy and offers a robust tool for personalized medicine, potentially leading to more effective and safer treatments for patients. Future research can expand on this work by exploring additional data modalities and refining the adaptive fusion mechanisms.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"19"},"PeriodicalIF":2.9,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}