BMC Bioinformatics最新文献

Hybrid generative adversarial network based on frequency and spatial domain for histopathological image synthesis.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-27 DOI: 10.1186/s12859-025-06057-9

Qifeng Liu, Tao Zhou, Chi Cheng, Jin Ma, Marzia Hoque Tania

Background: Due to the complexity and cost of preparing histopathological slides, deep learning-based methods have been developed to generate high-quality histological images. However, existing approaches primarily focus on spatial domain information, neglecting the periodic information in the frequency domain and the complementary relationship between the two domains. In this paper, we proposed a generative adversarial network that employs a cross-attention mechanism to extract and fuse features across spatial and frequency domains. The method optimizes frequency domain features using spatial domain guidance and refines spatial features with frequency domain information, preserving key details while eliminating redundancy to generate high-quality histological images.

Results: Our model incorporates a variable-window mixed attention module to dynamically adjust attention window sizes, capturing both local details and global context. A spectral filtering module enhances the extraction of repetitive textures and periodic structures, while a cross-attention fusion module dynamically weights features from both domains, focusing on the most critical information to produce realistic and detailed images.

Conclusions: The proposed method achieves efficient spatial-frequency domain fusion, significantly improving image generation quality. Experiments on the Patch Camelyon dataset show superior performance over eight state-of-the-art models across five metrics. This approach advances automated histopathological image generation with potential for clinical applications.

{"title":"Hybrid generative adversarial network based on frequency and spatial domain for histopathological image synthesis.","authors":"Qifeng Liu, Tao Zhou, Chi Cheng, Jin Ma, Marzia Hoque Tania","doi":"10.1186/s12859-025-06057-9","DOIUrl":"https://doi.org/10.1186/s12859-025-06057-9","url":null,"abstract":"Background: Due to the complexity and cost of preparing histopathological slides, deep learning-based methods have been developed to generate high-quality histological images. However, existing approaches primarily focus on spatial domain information, neglecting the periodic information in the frequency domain and the complementary relationship between the two domains. In this paper, we proposed a generative adversarial network that employs a cross-attention mechanism to extract and fuse features across spatial and frequency domains. The method optimizes frequency domain features using spatial domain guidance and refines spatial features with frequency domain information, preserving key details while eliminating redundancy to generate high-quality histological images.Results: Our model incorporates a variable-window mixed attention module to dynamically adjust attention window sizes, capturing both local details and global context. A spectral filtering module enhances the extraction of repetitive textures and periodic structures, while a cross-attention fusion module dynamically weights features from both domains, focusing on the most critical information to produce realistic and detailed images.Conclusions: The proposed method achieves efficient spatial-frequency domain fusion, significantly improving image generation quality. Experiments on the Patch Camelyon dataset show superior performance over eight state-of-the-art models across five metrics. This approach advances automated histopathological image generation with potential for clinical applications.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"29"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143051354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HDN-DDI: a novel framework for predicting drug-drug interactions using hierarchical molecular graphs and enhanced dual-view representation learning.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-25 DOI: 10.1186/s12859-025-06052-0

Jinchen Sun, Haoran Zheng

Background: Drug-drug interactions (DDIs) especially antagonistic ones present significant risks to patient safety, underscoring the urgent need for reliable prediction methods. Recently, substructure-based DDI prediction has garnered much attention due to the dominant influence of functional groups and substructures on drug properties. However, existing approaches face challenges regarding the insufficient interpretability of identified substructures and the isolation of chemical substructures.

Results: This study introduces a novel framework for DDI prediction termed HDN-DDI. HDN-DDI integrates an explainable substructure extraction module to decompose drug molecules and represents them using innovative hierarchical molecular graphs, which effectively incorporates information from real chemical substructures and improves molecules encoding efficiency. Furthermore, the enhanced dual-view learning method inspired by the underlying mechanisms of DDIs enables HDN-DDI to comprehensively capture both hierarchical structure and interaction information. Experimental results demonstrate that HDN-DDI has achieved state-of-the-art performance with accuracies of 97.90% and 99.38% on the two widely-used datasets in the warm-start setting. Moreover, HDN-DDI exhibits substantial improvements in the cold-start setting with boosts of 4.96% in accuracy and 7.08% in F1 score on previously unseen drugs. Real-world applications further highlight HDN-DDI's robust generalization capabilities towards newly approved drugs.

Conclusion: With its accurate predictions and robust generalization across different settings, HDN-DDI shows promise for enhancing drug safety and efficacy. Future research will focus on refining decomposition rules as well as integrating external knowledge while preserving the model's generalization capabilities.

背景：药物间相互作用（DDI），尤其是拮抗作用对患者安全构成重大风险，因此迫切需要可靠的预测方法。最近，由于官能团和亚结构对药物性质的主要影响，基于亚结构的 DDI 预测备受关注。然而，现有方法面临着所识别的亚结构可解释性不足和化学亚结构分离的挑战：本研究提出了一种新的 DDI 预测框架，称为 HDN-DDI。HDN-DDI 整合了一个可解释的亚结构提取模块来分解药物分子，并使用创新的分层分子图来表示它们，从而有效地整合了真实化学亚结构的信息，提高了分子编码效率。此外，HDN-DDI 受 DDI 潜在机制的启发，采用了增强的双视角学习方法，能够全面捕捉层次结构和相互作用信息。实验结果表明，在两个广泛使用的数据集上，HDN-DDI 在热启动设置下的准确率分别达到了 97.90% 和 99.38%，达到了最先进的水平。此外，HDN-DDI 在冷启动设置中也有大幅改进，在以前未见过的药物上，准确率提高了 4.96%，F1 分数提高了 7.08%。实际应用进一步凸显了 HDN-DDI 对新批准药物的强大泛化能力：HDN-DDI具有准确的预测和在不同环境下的强大泛化能力，有望提高药物的安全性和有效性。未来的研究重点将是完善分解规则以及整合外部知识，同时保持模型的泛化能力。

{"title":"HDN-DDI: a novel framework for predicting drug-drug interactions using hierarchical molecular graphs and enhanced dual-view representation learning.","authors":"Jinchen Sun, Haoran Zheng","doi":"10.1186/s12859-025-06052-0","DOIUrl":"10.1186/s12859-025-06052-0","url":null,"abstract":"Background: Drug-drug interactions (DDIs) especially antagonistic ones present significant risks to patient safety, underscoring the urgent need for reliable prediction methods. Recently, substructure-based DDI prediction has garnered much attention due to the dominant influence of functional groups and substructures on drug properties. However, existing approaches face challenges regarding the insufficient interpretability of identified substructures and the isolation of chemical substructures.Results: This study introduces a novel framework for DDI prediction termed HDN-DDI. HDN-DDI integrates an explainable substructure extraction module to decompose drug molecules and represents them using innovative hierarchical molecular graphs, which effectively incorporates information from real chemical substructures and improves molecules encoding efficiency. Furthermore, the enhanced dual-view learning method inspired by the underlying mechanisms of DDIs enables HDN-DDI to comprehensively capture both hierarchical structure and interaction information. Experimental results demonstrate that HDN-DDI has achieved state-of-the-art performance with accuracies of 97.90% and 99.38% on the two widely-used datasets in the warm-start setting. Moreover, HDN-DDI exhibits substantial improvements in the cold-start setting with boosts of 4.96% in accuracy and 7.08% in F1 score on previously unseen drugs. Real-world applications further highlight HDN-DDI's robust generalization capabilities towards newly approved drugs.Conclusion: With its accurate predictions and robust generalization across different settings, HDN-DDI shows promise for enhancing drug safety and efficacy. Future research will focus on refining decomposition rules as well as integrating external knowledge while preserving the model's generalization capabilities.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"28"},"PeriodicalIF":2.9,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11765940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143036651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BAC-browser: the tool for synthetic biology.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-23 DOI: 10.1186/s12859-025-06049-9

Tatiana A Semashko, Gleb Y Fisunov, Georgiy Y Shevelev, Vadim M Govorun

Background: Currently, synthetic genomics is a rapidly developing field. Its main tasks, such as the design of synthetic sequences and the assembly of DNA sequences from synthetic oligonucleotides, require specialized software. In this article, we present a program with a graphical interface that allows non-bioinformatics to perform the tasks needed in synthetic genomics.

Results: We developed BAC-browser v.2.1. It helps to design nucleotide sequences and features the following tools: generate nucleotide sequence from amino acid sequences using a codon frequency table for a specific organism, as well as visualization of restriction sites, GC composition, GC skew and secondary structure. To assemble DNA sequences, a fragmentation tool was created: regular breakdown into oligonucleotides of a certain length and breakdown into oligonucleotides with thermodynamic alignment. We demonstrate the possibility of DNA fragments assemblies designed in different modes of BAC-browser.

Conclusions: The BAC-browser has a large number of tools for working in the field of systemic genomics and is freely available at the link with a direct link https://sysbiomed.ru/upload/BAC-browser-2.1.zip .

{"title":"BAC-browser: the tool for synthetic biology.","authors":"Tatiana A Semashko, Gleb Y Fisunov, Georgiy Y Shevelev, Vadim M Govorun","doi":"10.1186/s12859-025-06049-9","DOIUrl":"10.1186/s12859-025-06049-9","url":null,"abstract":"Background: Currently, synthetic genomics is a rapidly developing field. Its main tasks, such as the design of synthetic sequences and the assembly of DNA sequences from synthetic oligonucleotides, require specialized software. In this article, we present a program with a graphical interface that allows non-bioinformatics to perform the tasks needed in synthetic genomics.Results: We developed BAC-browser v.2.1. It helps to design nucleotide sequences and features the following tools: generate nucleotide sequence from amino acid sequences using a codon frequency table for a specific organism, as well as visualization of restriction sites, GC composition, GC skew and secondary structure. To assemble DNA sequences, a fragmentation tool was created: regular breakdown into oligonucleotides of a certain length and breakdown into oligonucleotides with thermodynamic alignment. We demonstrate the possibility of DNA fragments assemblies designed in different modes of BAC-browser.Conclusions: The BAC-browser has a large number of tools for working in the field of systemic genomics and is freely available at the link with a direct link https://sysbiomed.ru/upload/BAC-browser-2.1.zip .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"27"},"PeriodicalIF":2.9,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758742/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint embedding-classifier learning for interpretable collaborative filtering.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-22 DOI: 10.1186/s12859-024-06026-8

Clémence Réda, Jill-Jênn Vie, Olaf Wolkenhauer

Background: Interpretability is a topical question in recommender systems, especially in healthcare applications. An interpretable classifier quantifies the importance of each input feature for the predicted item-user association in a non-ambiguous fashion.

Results: We introduce the novel Joint Embedding Learning-classifier for improved Interpretability (JELI). By combining the training of a structured collaborative-filtering classifier and an embedding learning task, JELI predicts new user-item associations based on jointly learned item and user embeddings while providing feature-wise importance scores. Therefore, JELI flexibly allows the introduction of priors on the connections between users, items, and features. In particular, JELI simultaneously (a) learns feature, item, and user embeddings; (b) predicts new item-user associations; (c) provides importance scores for each feature. Moreover, JELI instantiates a generic approach to training recommender systems by encoding generic graph-regularization constraints.

Conclusions: First, we show that the joint training approach yields a gain in the predictive power of the downstream classifier. Second, JELI can recover feature-association dependencies. Finally, JELI induces a restriction in the number of parameters compared to baselines in synthetic and drug-repurposing data sets.

{"title":"Joint embedding-classifier learning for interpretable collaborative filtering.","authors":"Clémence Réda, Jill-Jênn Vie, Olaf Wolkenhauer","doi":"10.1186/s12859-024-06026-8","DOIUrl":"10.1186/s12859-024-06026-8","url":null,"abstract":"Background: Interpretability is a topical question in recommender systems, especially in healthcare applications. An interpretable classifier quantifies the importance of each input feature for the predicted item-user association in a non-ambiguous fashion.Results: We introduce the novel Joint Embedding Learning-classifier for improved Interpretability (JELI). By combining the training of a structured collaborative-filtering classifier and an embedding learning task, JELI predicts new user-item associations based on jointly learned item and user embeddings while providing feature-wise importance scores. Therefore, JELI flexibly allows the introduction of priors on the connections between users, items, and features. In particular, JELI simultaneously (a) learns feature, item, and user embeddings; (b) predicts new item-user associations; (c) provides importance scores for each feature. Moreover, JELI instantiates a generic approach to training recommender systems by encoding generic graph-regularization constraints.Conclusions: First, we show that the joint training approach yields a gain in the predictive power of the downstream classifier. Second, JELI can recover feature-association dependencies. Finally, JELI induces a restriction in the number of parameters compared to baselines in synthetic and drug-repurposing data sets.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"26"},"PeriodicalIF":2.9,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143021825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive survey of scoring functions for protein docking models.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-22 DOI: 10.1186/s12859-024-05991-4

Azam Shirali, Vitalii Stebliankin, Ukesh Karki, Jimeng Shi, Prem Chapagain, Giri Narasimhan

Background: While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes.

Results: In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications.

Conclusions: We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.

{"title":"A comprehensive survey of scoring functions for protein docking models.","authors":"Azam Shirali, Vitalii Stebliankin, Ukesh Karki, Jimeng Shi, Prem Chapagain, Giri Narasimhan","doi":"10.1186/s12859-024-05991-4","DOIUrl":"10.1186/s12859-024-05991-4","url":null,"abstract":"Background: While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes.Results: In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications.Conclusions: We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"25"},"PeriodicalIF":2.9,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143021679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ClearFinder: a Python GUI for annotating cells in cleared mouse brain. ClearFinder：一个Python GUI，用于注释已清除的老鼠大脑中的细胞。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-21 DOI: 10.1186/s12859-025-06039-x

Stefan Pastore, Philipp Hillenbrand, Nils Molnar, Irina Kovlyagina, Monika Chanu Chongtham, Stanislav Sys, Beat Lutz, Margarita Tevosian, Susanne Gerber

Background: Tissue clearing combined with light-sheet microscopy is gaining popularity among neuroscientists interested in unbiased assessment of their samples in 3D volume. However, the analysis of such data remains a challenge. ClearMap and CellFinder are tools for analyzing neuronal activity maps in an intact volume of cleared mouse brains. However, these tools lack a user interface, restricting accessibility primarily to scientists proficient in advanced Python programming. The application presented here aims to bridge this gap and make data analysis accessible to a wider scientific community.

Results: We developed an easy-to-adopt graphical user interface for cell quantification and group analysis of whole cleared adult mouse brains. Fundamental statistical analysis, such as PCA and box plots, and additional visualization features allow for quick data evaluation and quality checks. Furthermore, we present a use case of ClearFinder GUI for cross-analyzing the same samples with two cell counting tools, highlighting the discrepancies in cell detection efficiency between them.

Conclusions: Our easily accessible tool allows more researchers to implement the methodology, troubleshoot arising issues, and develop quality checks, benchmarking, and standardized analysis pipelines for cell detection and region annotation in whole volumes of cleared brains.

背景：组织清除结合光片显微镜在神经科学家中越来越受欢迎，他们对3D体积样品的公正评估感兴趣。然而，对这些数据的分析仍然是一个挑战。ClearMap和CellFinder是分析清除小鼠大脑完整体积内神经元活动图的工具。然而，这些工具缺乏用户界面，限制了主要对精通高级Python编程的科学家的访问。这里展示的应用程序旨在弥合这一差距，并使数据分析能够为更广泛的科学界所接受。结果：我们开发了一个易于使用的图形用户界面，用于细胞定量和全清除成年小鼠脑的组分析。基本的统计分析，如PCA和箱形图，以及额外的可视化特性允许快速的数据评估和质量检查。此外，我们提出了一个ClearFinder GUI用例，用两种细胞计数工具交叉分析相同的样本，突出了它们之间细胞检测效率的差异。结论：我们易于使用的工具允许更多的研究人员实施方法，排除出现的问题，并开发质量检查，基准测试和标准化分析管道，用于细胞检测和区域注释。

{"title":"ClearFinder: a Python GUI for annotating cells in cleared mouse brain.","authors":"Stefan Pastore, Philipp Hillenbrand, Nils Molnar, Irina Kovlyagina, Monika Chanu Chongtham, Stanislav Sys, Beat Lutz, Margarita Tevosian, Susanne Gerber","doi":"10.1186/s12859-025-06039-x","DOIUrl":"10.1186/s12859-025-06039-x","url":null,"abstract":"Background: Tissue clearing combined with light-sheet microscopy is gaining popularity among neuroscientists interested in unbiased assessment of their samples in 3D volume. However, the analysis of such data remains a challenge. ClearMap and CellFinder are tools for analyzing neuronal activity maps in an intact volume of cleared mouse brains. However, these tools lack a user interface, restricting accessibility primarily to scientists proficient in advanced Python programming. The application presented here aims to bridge this gap and make data analysis accessible to a wider scientific community.Results: We developed an easy-to-adopt graphical user interface for cell quantification and group analysis of whole cleared adult mouse brains. Fundamental statistical analysis, such as PCA and box plots, and additional visualization features allow for quick data evaluation and quality checks. Furthermore, we present a use case of ClearFinder GUI for cross-analyzing the same samples with two cell counting tools, highlighting the discrepancies in cell detection efficiency between them.Conclusions: Our easily accessible tool allows more researchers to implement the methodology, troubleshoot arising issues, and develop quality checks, benchmarking, and standardized analysis pipelines for cell detection and region annotation in whole volumes of cleared brains.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"24"},"PeriodicalIF":2.9,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A graph neural network approach for hierarchical mapping of breast cancer protein communities. 基于图神经网络的乳腺癌蛋白群体分层映射方法。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-21 DOI: 10.1186/s12859-024-06015-x

Xiao Zhang, Qian Liu

Background: Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.

Results: Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.

Conclusion: The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.

背景：全面绘制乳腺癌蛋白群落的层次结构并从中识别潜在的生物标志物是乳腺癌研究的一种很有前途的途径。现有的方法是主观的，没有考虑到蛋白质序列的信息。深度学习可以自动从蛋白质序列和蛋白质-蛋白质相互作用中学习特征，用于分层聚类。结果：利用大量公开可用的蛋白质组学数据，我们使用一种新的层次图神经网络，在基因本体术语的监督和预训练的深度上下文语言模型的帮助下，为乳腺癌蛋白质社区创建了一个层次树。然后，应用群-套索算法来识别同时承受突变负担和生存负担的蛋白质群落，当特定药物分子靶向时发生显着改变，并显示出癌症依赖性扰动。由此产生的蛋白质群落层次图显示了基因水平的突变和生存信息如何在不同尺度上汇聚到蛋白质群落中。通过将BRCA2趋同为乳腺癌热点，建立了模型的内部有效性。与乳腺癌细胞依赖性的进一步重叠表明，SUPT6H和RAD21以及它们各自的蛋白质系统HOST:37和HOST:861是潜在的生物标志物。利用HOST:37和HOST:861基因集的基因水平扰动数据，选择3种fda批准的具有较高治疗价值的药物作为潜在治疗药物进行进一步评价。这些药物包括巯基嘌呤、吡格列酮和秋水仙碱。结论：本文提出的图神经网络方法在层次结构中分析乳腺癌蛋白群落，为乳腺癌的预后和治疗提供了新的视角。通过靶向整个基因集，我们能够在不同水平上评估基因（或基因集）的预后和治疗价值，从基因水平到系统水平生物学。癌症特异性基因依赖性为精确定位癌症相关系统提供了额外的背景，药物诱导的改变可以突出潜在的治疗靶点。这些已确定的蛋白质群落，与其他具有强突变和生存负担的蛋白质群落一起，可能被用作乳腺癌的临床生物标志物。

{"title":"A graph neural network approach for hierarchical mapping of breast cancer protein communities.","authors":"Xiao Zhang, Qian Liu","doi":"10.1186/s12859-024-06015-x","DOIUrl":"10.1186/s12859-024-06015-x","url":null,"abstract":"Background: Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.Results: Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.Conclusion: The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"23"},"PeriodicalIF":2.9,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11749236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmarking cell type annotation methods for 10x Xenium spatial transcriptomics data. 10x Xenium空间转录组学数据的基准细胞类型注释方法。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-20 DOI: 10.1186/s12859-025-06044-0

Jinming Cheng, Xinyi Jin, Gordon K Smyth, Yunshun Chen

Background: Imaging-based spatial transcriptomics technologies allow us to explore spatial gene expression profiles at the cellular level. Cell type annotation of imaging-based spatial data is challenging due to the small gene panel, but it is a crucial step for downstream analyses. Many good reference-based cell type annotation tools have been developed for single-cell RNA sequencing and sequencing-based spatial transcriptomics data. However, the performance of the reference-based cell type annotation tools on imaging-based spatial transcriptomics data has not been well studied yet.

Results: We compared performance of five reference-based methods (SingleR, Azimuth, RCTD, scPred and scmapCell) with the marker-gene-based manual annotation method on an imaging-based Xenium data of human breast cancer. A practical workflow has been demonstrated for preparing a high-quality single-cell RNA reference, evaluating the accuracy, and estimating the running time for reference-based cell type annotation tools.

Conclusions: SingleR was the best performing reference-based cell type annotation tool for the Xenium platform, being fast, accurate and easy to use, with results closely matching those of manual annotation.

背景：基于成像的空间转录组学技术使我们能够在细胞水平上探索空间基因表达谱。由于基因面板较小，基于图像的空间数据的细胞类型注释具有挑战性，但它是下游分析的关键步骤。许多基于参考的细胞类型注释工具已经开发出来，用于单细胞RNA测序和基于测序的空间转录组学数据。然而，基于参考的细胞类型注释工具在基于成像的空间转录组学数据上的性能尚未得到很好的研究。结果：我们比较了五种基于参考的方法（SingleR、Azimuth、RCTD、scPred和scmapCell）与基于标记基因的人工注释方法在基于成像的人乳腺癌Xenium数据上的性能。一个实际的工作流程已经证明准备一个高质量的单细胞RNA参考，评估准确性，并估计运行时间的参考细胞类型注释工具。结论：SingleR是Xenium平台上性能最好的基于参考的细胞类型标注工具，快速、准确、易于使用，与手工标注结果接近。

{"title":"Benchmarking cell type annotation methods for 10x Xenium spatial transcriptomics data.","authors":"Jinming Cheng, Xinyi Jin, Gordon K Smyth, Yunshun Chen","doi":"10.1186/s12859-025-06044-0","DOIUrl":"10.1186/s12859-025-06044-0","url":null,"abstract":"Background: Imaging-based spatial transcriptomics technologies allow us to explore spatial gene expression profiles at the cellular level. Cell type annotation of imaging-based spatial data is challenging due to the small gene panel, but it is a crucial step for downstream analyses. Many good reference-based cell type annotation tools have been developed for single-cell RNA sequencing and sequencing-based spatial transcriptomics data. However, the performance of the reference-based cell type annotation tools on imaging-based spatial transcriptomics data has not been well studied yet.Results: We compared performance of five reference-based methods (SingleR, Azimuth, RCTD, scPred and scmapCell) with the marker-gene-based manual annotation method on an imaging-based Xenium data of human breast cancer. A practical workflow has been demonstrated for preparing a high-quality single-cell RNA reference, evaluating the accuracy, and estimating the running time for reference-based cell type annotation tools.Conclusions: SingleR was the best performing reference-based cell type annotation tool for the Xenium platform, being fast, accurate and easy to use, with results closely matching those of manual annotation.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"22"},"PeriodicalIF":2.9,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11744978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

getphylo: rapid and automatic generation of multi-locus phylogenetic trees. 快速和自动生成多位点系统发生树。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-18 DOI: 10.1186/s12859-025-06035-1

T J Booth, S Shaw, P Cruz-Morales, T Weber

Background: The increasing amount of genomic data calls for tools that can create genome-scale phylogenies quickly and efficiently. Existing tools rely on large reference databases or require lengthy de novo calculations to identify orthologues, meaning that they have long run times and are limited in their taxonomic scope. To address this, we created getphylo, a python tool for the rapid generation of phylogenetic trees de novo from annotated sequences.

Results: We present getphylo (Genbank to Phylogeny), a tool that automatically builds phylogenetic trees from annotated genomes alone. Orthologues are identified heuristically by searching for singletons (single copy genes) across all input genomes and the phylogeny is inferred from a concatenated alignment of all coding sequences by maximum likelihood. We performed a thorough benchmarking of getphylo against two existing tools, autoMLST and GTDB-tk, to show that it can produce trees of comparable quality in a fraction of the time. We also demonstrate the flexibility of getphylo across four case studies including bacterial and eukaryotic genomes, and biosynthetic gene clusters.

Conclusions: getphylo is a quick and reliable tool for the automated generation of genome-scale phylogenetic trees. getphylo can produce phylogenies comparable to other software in a fraction of the time, without the need large local databases or intense computation. getphylo can rapidly identify orthologues from a wide variety of datasets regardless of taxonomic or genomic scope. The usability, speed, flexibility of getphylo makes it a valuable addition to the phylogenetics toolkit.

背景：越来越多的基因组数据需要能够快速有效地创建基因组级系统发育的工具。现有的工具依赖于大型参考数据库，或者需要冗长的从头计算来识别同源物，这意味着它们运行时间长，分类范围有限。为了解决这个问题，我们创建了getphylo，这是一个python工具，用于从注释序列从头快速生成系统发育树。结果：我们提出了getphylo (Genbank to Phylogeny)，这是一个自动从注释基因组构建系统发生树的工具。通过在所有输入基因组中搜索单基因（单拷贝基因），启发式地识别同源物，并通过最大似然法从所有编码序列的串联比对中推断出系统发育。我们针对两个现有工具（autoMLST和GTDB-tk）对getphylo进行了全面的基准测试，以表明它可以在很短的时间内生成具有相当质量的树。我们还展示了getphylo在四个案例研究中的灵活性，包括细菌和真核生物基因组，以及生物合成基因簇。结论：getphylo是一种快速、可靠的自动生成基因组级系统发育树的工具。Getphylo可以在很短的时间内生成与其他软件相当的系统发育，而不需要大型本地数据库或密集的计算。Getphylo可以快速识别同源物从各种各样的数据集，无论分类学或基因组范围。getphylo的可用性、速度和灵活性使其成为系统遗传学工具包的一个有价值的补充。

{"title":"getphylo: rapid and automatic generation of multi-locus phylogenetic trees.","authors":"T J Booth, S Shaw, P Cruz-Morales, T Weber","doi":"10.1186/s12859-025-06035-1","DOIUrl":"10.1186/s12859-025-06035-1","url":null,"abstract":"Background: The increasing amount of genomic data calls for tools that can create genome-scale phylogenies quickly and efficiently. Existing tools rely on large reference databases or require lengthy de novo calculations to identify orthologues, meaning that they have long run times and are limited in their taxonomic scope. To address this, we created getphylo, a python tool for the rapid generation of phylogenetic trees de novo from annotated sequences.Results: We present getphylo (Genbank to Phylogeny), a tool that automatically builds phylogenetic trees from annotated genomes alone. Orthologues are identified heuristically by searching for singletons (single copy genes) across all input genomes and the phylogeny is inferred from a concatenated alignment of all coding sequences by maximum likelihood. We performed a thorough benchmarking of getphylo against two existing tools, autoMLST and GTDB-tk, to show that it can produce trees of comparable quality in a fraction of the time. We also demonstrate the flexibility of getphylo across four case studies including bacterial and eukaryotic genomes, and biosynthetic gene clusters.Conclusions: getphylo is a quick and reliable tool for the automated generation of genome-scale phylogenetic trees. getphylo can produce phylogenies comparable to other software in a fraction of the time, without the need large local databases or intense computation. getphylo can rapidly identify orthologues from a wide variety of datasets regardless of taxonomic or genomic scope. The usability, speed, flexibility of getphylo makes it a valuable addition to the phylogenetics toolkit.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"21"},"PeriodicalIF":2.9,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748604/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MGATAF: multi-channel graph attention network with adaptive fusion for cancer-drug response prediction. 基于自适应融合的多通道图注意网络癌症药物反应预测。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics

Pub Date : 2025-01-17 DOI: 10.1186/s12859-024-05987-0

Dhekra Saeed, Huanlai Xing, Barakat AlBadani, Li Feng, Raeed Al-Sabri, Monir Abdullah, Amir Rehman

Background: Drug response prediction is critical in precision medicine to determine the most effective and safe treatments for individual patients. Traditional prediction methods relying on demographic and genetic data often fall short in accuracy and robustness. Recent graph-based models, while promising, frequently neglect the critical role of atomic interactions and fail to integrate drug fingerprints with SMILES for comprehensive molecular graph construction.

Results: We introduce multimodal multi-channel graph attention network with adaptive fusion (MGATAF), a framework designed to enhance drug response predictions by capturing both local and global interactions among graph nodes. MGATAF improves drug representation by integrating SMILES and fingerprints, resulting in more precise predictions of drug effects. The methodology involves constructing multimodal molecular graphs, employing multi-channel graph attention networks to capture diverse interactions, and using adaptive fusion to integrate these interactions at multiple abstraction levels. Empirical results demonstrate MGATAF's superior performance compared to traditional and other graph-based techniques. For example, on the GDSC dataset, MGATAF achieved a 5.12% improvement in the Pearson correlation coefficient (PCC), reaching 0.9312 with an RMSE of 0.0225. Similarly, in new cell-line tests, MGATAF outperformed baselines with a PCC of 0.8536 and an RMSE of 0.0321 on the GDSC dataset, and a PCC of 0.7364 with an RMSE of 0.0531 on the CCLE dataset.

Conclusions: MGATAF significantly advances drug response prediction by effectively integrating multiple molecular data types and capturing complex interactions. This framework enhances prediction accuracy and offers a robust tool for personalized medicine, potentially leading to more effective and safer treatments for patients. Future research can expand on this work by exploring additional data modalities and refining the adaptive fusion mechanisms.

背景：药物反应预测在精准医学中至关重要，它可以为个体患者确定最有效和最安全的治疗方法。传统的基于人口统计和遗传数据的预测方法往往在准确性和稳健性方面存在不足。最近的基于图的模型虽然很有前景，但往往忽略了原子相互作用的关键作用，并且无法将药物指纹与SMILES结合起来进行全面的分子图构建。结果：我们引入了带有自适应融合的多模态多通道图注意网络（MGATAF），该框架旨在通过捕获图节点之间的局部和全局相互作用来增强药物反应预测。MGATAF通过整合smile和指纹来改善药物表征，从而更精确地预测药物效果。该方法包括构建多模态分子图，采用多通道图注意网络捕获不同的相互作用，并使用自适应融合在多个抽象层次上整合这些相互作用。实证结果表明，与传统和其他基于图形的技术相比，MGATAF具有优越的性能。例如，在GDSC数据集上，MGATAF的Pearson相关系数（PCC）提高了5.12%，达到0.9312，RMSE为0.0225。同样，在新的细胞系测试中，MGATAF在GDSC数据集上的PCC为0.8536，RMSE为0.0321优于基线，在CCLE数据集上的PCC为0.7364，RMSE为0.0531。结论：MGATAF通过有效整合多种分子数据类型和捕获复杂的相互作用，显著推进了药物反应预测。该框架提高了预测的准确性，并为个性化医疗提供了一个强大的工具，有可能为患者带来更有效和更安全的治疗。未来的研究可以通过探索其他数据模式和完善自适应融合机制来扩展这项工作。

{"title":"MGATAF: multi-channel graph attention network with adaptive fusion for cancer-drug response prediction.","authors":"Dhekra Saeed, Huanlai Xing, Barakat AlBadani, Li Feng, Raeed Al-Sabri, Monir Abdullah, Amir Rehman","doi":"10.1186/s12859-024-05987-0","DOIUrl":"10.1186/s12859-024-05987-0","url":null,"abstract":"Background: Drug response prediction is critical in precision medicine to determine the most effective and safe treatments for individual patients. Traditional prediction methods relying on demographic and genetic data often fall short in accuracy and robustness. Recent graph-based models, while promising, frequently neglect the critical role of atomic interactions and fail to integrate drug fingerprints with SMILES for comprehensive molecular graph construction.Results: We introduce multimodal multi-channel graph attention network with adaptive fusion (MGATAF), a framework designed to enhance drug response predictions by capturing both local and global interactions among graph nodes. MGATAF improves drug representation by integrating SMILES and fingerprints, resulting in more precise predictions of drug effects. The methodology involves constructing multimodal molecular graphs, employing multi-channel graph attention networks to capture diverse interactions, and using adaptive fusion to integrate these interactions at multiple abstraction levels. Empirical results demonstrate MGATAF's superior performance compared to traditional and other graph-based techniques. For example, on the GDSC dataset, MGATAF achieved a 5.12% improvement in the Pearson correlation coefficient (PCC), reaching 0.9312 with an RMSE of 0.0225. Similarly, in new cell-line tests, MGATAF outperformed baselines with a PCC of 0.8536 and an RMSE of 0.0321 on the GDSC dataset, and a PCC of 0.7364 with an RMSE of 0.0531 on the CCLE dataset.Conclusions: MGATAF significantly advances drug response prediction by effectively integrating multiple molecular data types and capturing complex interactions. This framework enhances prediction accuracy and offers a robust tool for personalized medicine, potentially leading to more effective and safer treatments for patients. Future research can expand on this work by exploring additional data modalities and refining the adaptive fusion mechanisms.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"19"},"PeriodicalIF":2.9,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0