首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
StarPepWeb: an integrative, graph-based resource for bioactive peptides. StarPepWeb:一个综合性的、基于图形的生物活性肽资源。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-16 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf261
Christian López, Roberto Cárdenas, Longendri Aguilera-Mendoza, Guillermin Agüero-Chapin, Félix Martínez-Rios, César R García-Jacas, Noel Pérez-Pérez, Yovani Marrero-Ponce

Motivation: The rapid growth of bioactive peptide sequences presents challenges for organization and analysis. Existing repositories often specialize in functions, taxonomic origins, or structural classes, but most remain isolated, use heterogeneous metadata, and lack uniform descriptors or structural models. Few integrative web services exist, offering only partial coverage or depth. As a result, reproducible and comprehensive exploration of the bioactive peptide landscape remains limited, underscoring the need for a unified, source-tracked, extensible platform.

Results: We present StarPepWeb, a freely accessible web application that democratizes access to StarPepDB, one of the largest curated repositories of bioactive peptides. The platform integrates 45 120 non-redundant sequences from 40 public databases into a source-tracked graph enriched with metadata, physicochemical features, and predicted 3D structures from ESMFold. Each peptide is represented with ESM-2 embeddings and iFeature descriptors, while the interface supports metadata-aware filtering, alignment-based similarity searches with single and multiple queries, and interactive visualization. A microservice-oriented architecture ensures scalability, maintainability, and reproducible versioned downloads, including Neo4j exports. StarPepWeb thus overcomes deployment and expertise barriers of the standalone database, providing an extensible, cloud-hosted framework for integrative bioactive peptide analysis.

Availability and implementation: StarPepWeb is freely available at https://starpepweb.org. Source code and documentation are hosted at https://github.com/starpep-web.

动机:生物活性肽序列的快速增长对组织和分析提出了挑战。现有的存储库通常专注于功能、分类起源或结构类,但大多数存储库仍然是孤立的,使用异构元数据,并且缺乏统一的描述符或结构模型。很少有集成的web服务存在,仅提供部分覆盖或深度。因此,对生物活性肽景观的可重复和全面的探索仍然有限,强调需要一个统一的,来源跟踪的,可扩展的平台。结果:我们提出了StarPepWeb,一个免费访问的web应用程序,使访问StarPepDB民主化,StarPepDB是最大的生物活性肽库之一。该平台将来自40个公共数据库的45 120个非冗余序列集成到一个源跟踪图中,该图富含元数据、物理化学特征和ESMFold预测的3D结构。每个肽都用ESM-2嵌入和iFeature描述符表示,而界面支持元数据感知过滤,基于对齐的单一和多个查询相似度搜索,以及交互式可视化。微面向服务的体系结构确保了可伸缩性、可维护性和可复制的版本下载,包括Neo4j导出。因此,StarPepWeb克服了独立数据库的部署和专业知识障碍,为综合生物活性肽分析提供了一个可扩展的云托管框架。可用性和实现:StarPepWeb可以在https://starpepweb.org上免费获得。源代码和文档托管于https://github.com/starpep-web。
{"title":"StarPepWeb: an integrative, graph-based resource for bioactive peptides.","authors":"Christian López, Roberto Cárdenas, Longendri Aguilera-Mendoza, Guillermin Agüero-Chapin, Félix Martínez-Rios, César R García-Jacas, Noel Pérez-Pérez, Yovani Marrero-Ponce","doi":"10.1093/bioadv/vbaf261","DOIUrl":"10.1093/bioadv/vbaf261","url":null,"abstract":"<p><strong>Motivation: </strong>The rapid growth of bioactive peptide sequences presents challenges for organization and analysis. Existing repositories often specialize in functions, taxonomic origins, or structural classes, but most remain isolated, use heterogeneous metadata, and lack uniform descriptors or structural models. Few integrative web services exist, offering only partial coverage or depth. As a result, reproducible and comprehensive exploration of the bioactive peptide landscape remains limited, underscoring the need for a unified, source-tracked, extensible platform.</p><p><strong>Results: </strong>We present StarPepWeb, a freely accessible web application that democratizes access to StarPepDB, one of the largest curated repositories of bioactive peptides. The platform integrates 45 120 non-redundant sequences from 40 public databases into a source-tracked graph enriched with metadata, physicochemical features, and predicted 3D structures from ESMFold. Each peptide is represented with ESM-2 embeddings and iFeature descriptors, while the interface supports metadata-aware filtering, alignment-based similarity searches with single and multiple queries, and interactive visualization. A microservice-oriented architecture ensures scalability, maintainability, and reproducible versioned downloads, including Neo4j exports. StarPepWeb thus overcomes deployment and expertise barriers of the standalone database, providing an extensible, cloud-hosted framework for integrative bioactive peptide analysis.</p><p><strong>Availability and implementation: </strong>StarPepWeb is freely available at https://starpepweb.org. Source code and documentation are hosted at https://github.com/starpep-web.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf261"},"PeriodicalIF":2.8,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Hydractinia Genome Project Portal: multi-omic annotation and visualization of Hydractinia genomic datasets. 水螅基因组计划门户:水螅基因组数据集的多组注释和可视化。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-15 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf215
R Travis Moreland, Christine E Schnitzler, Suiyuan Zhang, Sumeeta Singh, Tyra G Wolfsberg, Andreas D Baxevanis

Motivation: The colonial hydroid Hydractinia exhibits several unique biological properties, including its remarkable regenerative capacity and the ability to distinguish self from non-self, characteristics that make them valuable models for studying human disease and aging. The availability of well-annotated multi-omic data, as well as tools to visualize these data, is essential for advancing the use of these model organisms to enhance our understanding of the relationship between genomic and morphological complexity, the evolution of multicellularity, and the emergence of novel cell types.

Results: We present the Hydractinia Genome Project Portal, a comprehensive resource providing genomic, transcriptomic, and proteomic datasets for two widely studied Hydractinia species. The portal provides extensive sequence, structure, and functional annotation resources that are not available elsewhere, including genome browsers, a single-cell gene expression atlas, a protein structure viewer, and a custom BLAST implementation. We demonstrate the portal's utility for biological discovery and have used a subset of Hydractinia-specific stem cell gene markers to explore known gaps in annotation transfer methods, illustrating how structure-based deep learning methods such as DeepFRI can significantly improve the functional annotation of heretofore unannotated i-cell markers.

Availability and implementation: The Hydractinia Genome Project Portal is freely available at https://research.nhgri.nih.gov/hydractinia.

动机:水螅虫群体表现出一些独特的生物学特性,包括其显著的再生能力和区分自我与非自我的能力,这些特性使它们成为研究人类疾病和衰老的有价值的模型。多基因组数据的可用性,以及可视化这些数据的工具,对于推进这些模式生物的使用,增强我们对基因组和形态复杂性、多细胞进化和新细胞类型出现之间关系的理解至关重要。结果:我们提出了水葫芦基因组计划门户网站,这是一个全面的资源,提供了两个广泛研究的水葫芦物种的基因组,转录组学和蛋白质组学数据集。该门户提供了其他地方没有的大量序列、结构和功能注释资源,包括基因组浏览器、单细胞基因表达图谱、蛋白质结构查看器和自定义BLAST实现。我们展示了门户网站在生物学发现方面的实用性,并使用了hydractinia特异性干细胞基因标记的子集来探索注释转移方法中的已知空白,说明了基于结构的深度学习方法(如DeepFRI)如何显着改善迄今未注释的i细胞标记的功能注释。可用性和实施:Hydractinia基因组计划门户网站免费提供https://research.nhgri.nih.gov/hydractinia。
{"title":"The <i>Hydractinia</i> Genome Project Portal: multi-omic annotation and visualization of <i>Hydractinia</i> genomic datasets.","authors":"R Travis Moreland, Christine E Schnitzler, Suiyuan Zhang, Sumeeta Singh, Tyra G Wolfsberg, Andreas D Baxevanis","doi":"10.1093/bioadv/vbaf215","DOIUrl":"10.1093/bioadv/vbaf215","url":null,"abstract":"<p><strong>Motivation: </strong>The colonial hydroid <i>Hydractinia</i> exhibits several unique biological properties, including its remarkable regenerative capacity and the ability to distinguish self from non-self, characteristics that make them valuable models for studying human disease and aging. The availability of well-annotated multi-omic data, as well as tools to visualize these data, is essential for advancing the use of these model organisms to enhance our understanding of the relationship between genomic and morphological complexity, the evolution of multicellularity, and the emergence of novel cell types.</p><p><strong>Results: </strong>We present the <i>Hydractinia</i> Genome Project Portal, a comprehensive resource providing genomic, transcriptomic, and proteomic datasets for two widely studied <i>Hydractinia</i> species. The portal provides extensive sequence, structure, and functional annotation resources that are not available elsewhere, including genome browsers, a single-cell gene expression atlas, a protein structure viewer, and a custom BLAST implementation. We demonstrate the portal's utility for biological discovery and have used a subset of <i>Hydractinia</i>-specific stem cell gene markers to explore known gaps in annotation transfer methods, illustrating how structure-based deep learning methods such as DeepFRI can significantly improve the functional annotation of heretofore unannotated i-cell markers.</p><p><strong>Availability and implementation: </strong>The <i>Hydractinia</i> Genome Project Portal is freely available at https://research.nhgri.nih.gov/hydractinia.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf215"},"PeriodicalIF":2.8,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12624445/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICCTax: a hierarchical taxonomic classifier for metagenomic sequences on a large language model. ICCTax:一个基于大语言模型的元基因组序列分级分类器。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-15 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf257
Yichun Gao, Jiaxing Bai, Feng Zhou, Yushuang He, Ying Wang, Xiaobing Huang

Motivation: Metagenomic data increasingly reflect the coexistence of species from Archaea, Bacteria, Eukaryotes, and Viruses in complex environments. Taxonomic classification across the four superkingdoms is essential for understanding microbial communities, exploring genomic evolutionary relationships, and identifying novel species. This task is inherently imbalanced, uneven, and hierarchical. Genomic sequences provide crucial information for taxonomy classification, but many existing methods relying on sequence similarity to reference genomes often leave sequences misclassified due to incomplete or absent reference databases. Large language models offer a novel approach to extract intrinsic characteristics from sequences.

Results: We present ICCTax, a classifier integrating the large language model HyenaDNA with complementary-view-based hierarchical metric learning and hierarchical-level compactness loss to identify taxonomic genomic sequences. ICCTax accurately classifies sequences to 155 genera and 43 phyla across the four superkingdoms, including unseen taxa. Across three datasets built with different strategies, ICCTax outperforms baseline methods, particularly on Out-of-Distribution data. On Simulated Marine Metagenomic Communities datasets from three oceanic sites, DairyDB-16S rRNA, Tara Oceans, and wastewater metagenomic datasets, it demonstrates strong performance, showcasing real-world applicability. ICCTax can further support identification of novel species and functional genes across diverse environments, enhancing understanding of microbial ecology.

Availability and implementation: Code is available at https://github.com/Ying-Lab/ICCTax.

动机:宏基因组数据越来越多地反映了古生菌、细菌、真核生物和病毒等物种在复杂环境中的共存。这四个超级王国之间的分类学分类对于理解微生物群落、探索基因组进化关系和识别新物种至关重要。这项任务本质上是不平衡的、不平衡的和分层的。基因组序列为分类分类提供了重要的信息,但现有的许多方法依赖于序列与参考基因组的相似性,往往由于参考数据库不完整或缺失而导致序列分类错误。大型语言模型为从序列中提取内在特征提供了一种新的方法。结果:我们提出了ICCTax分类器,该分类器将大型语言模型HyenaDNA与基于互补视图的分层度量学习和分层级紧凑性损失相结合,用于识别分类基因组序列。ICCTax准确地将序列划分为四个超级王国的155个属和43个门,包括未见过的分类群。在使用不同策略构建的三个数据集中,ICCTax优于基线方法,特别是在非分布数据上。在三个海洋站点的模拟海洋宏基因组群落数据集、DairyDB-16S rRNA、Tara Oceans和废水宏基因组数据集上,它展示了强大的性能,展示了现实世界的适用性。ICCTax可以进一步支持在不同环境中鉴定新物种和功能基因,增强对微生物生态学的理解。可用性和实现:代码可从https://github.com/Ying-Lab/ICCTax获得。
{"title":"ICCTax: a hierarchical taxonomic classifier for metagenomic sequences on a large language model.","authors":"Yichun Gao, Jiaxing Bai, Feng Zhou, Yushuang He, Ying Wang, Xiaobing Huang","doi":"10.1093/bioadv/vbaf257","DOIUrl":"10.1093/bioadv/vbaf257","url":null,"abstract":"<p><strong>Motivation: </strong>Metagenomic data increasingly reflect the coexistence of species from Archaea, Bacteria, Eukaryotes, and Viruses in complex environments. Taxonomic classification across the four superkingdoms is essential for understanding microbial communities, exploring genomic evolutionary relationships, and identifying novel species. This task is inherently imbalanced, uneven, and hierarchical. Genomic sequences provide crucial information for taxonomy classification, but many existing methods relying on sequence similarity to reference genomes often leave sequences misclassified due to incomplete or absent reference databases. Large language models offer a novel approach to extract intrinsic characteristics from sequences.</p><p><strong>Results: </strong>We present ICCTax, a classifier integrating the large language model HyenaDNA with complementary-view-based hierarchical metric learning and hierarchical-level compactness loss to identify taxonomic genomic sequences. ICCTax accurately classifies sequences to 155 genera and 43 phyla across the four superkingdoms, including unseen taxa. Across three datasets built with different strategies, ICCTax outperforms baseline methods, particularly on Out-of-Distribution data. On Simulated Marine Metagenomic Communities datasets from three oceanic sites, DairyDB-16S rRNA, Tara Oceans, and wastewater metagenomic datasets, it demonstrates strong performance, showcasing real-world applicability. ICCTax can further support identification of novel species and functional genes across diverse environments, enhancing understanding of microbial ecology.</p><p><strong>Availability and implementation: </strong>Code is available at https://github.com/Ying-Lab/ICCTax.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf257"},"PeriodicalIF":2.8,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12619997/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico analysis of insect-associated bacterial phytases reveals optimal biochemical properties and function in poultry gut condition. 昆虫相关细菌植酸酶的计算机分析揭示了家禽肠道条件下最佳的生化特性和功能。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-15 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf256
Olyad Erba Urgessa, Ketema Tafess Tulu, Mesfin Tafesse Gemeda, Hunduma Dinka

Motivation: Insect guts may harbor phytase-producing bacteria applicable in poultry nutrition, but only Serratia sp. TN49 and its histidine acid phytase (AEQ29498.1) have been studied for this purpose. Therefore, AEQ29498.1 was used as a query to conduct a homology search for insect-associated bacterial phytases, followed by prediction of their structure and function. This in silico analysis of phytase may lead to the isolation of native phytase-producing bacteria from insect guts, potentially facilitating the production of desirable phytases for use in feed additives.

Results: Twenty-six phytases from bacteria associated with the guts of black soldier fly larvae, fruit flies, and honey bees were identified. The mature chains of these phytases, except for the 4-phytase of Bartocella apis PEB0150, were predicted to carry a positive charge under the acidic conditions of the poultry upper gastrointestinal tract. They are stable (instability indices <40) and belong to histidine acid phosphatase family, which has been proven to be an effective poultry feed additive. The three-dimensional structure of the mature histidine-type phosphatase of Tatumella sp. JGM130 demonstrated the best quality and was found to be a homo-tetrameric protein. Molecular docking confirmed phytate binding at the catalytic motif of the histidine acid phosphatase family, RHGVRPP/AP/Q and HD.

动机:昆虫肠道可能含有可用于家禽营养的产植酸菌,但目前仅研究了Serratia sp. TN49及其组氨酸酸植酸酶(AEQ29498.1)。因此,我们以AEQ29498.1作为查询,对昆虫相关的细菌植酸酶进行同源性搜索,并对其结构和功能进行预测。这种对植酸酶的硅分析可能导致从昆虫肠道中分离出天然产植酸酶的细菌,从而有可能促进生产用于饲料添加剂的所需植酸酶。结果:从黑虻幼虫、果蝇和蜜蜂肠道相关细菌中鉴定出26种植酸酶。在家禽上消化道酸性条件下,除4-植酸酶PEB0150外,其余成熟的植酸酶链均带正电荷。其中以JGM130为最佳,是一种同源四聚体蛋白。分子对接证实了组氨酸酸性磷酸酶家族、RHGVRPP/AP/Q和HD催化基序上的植酸结合。
{"title":"<i>In silico</i> analysis of insect-associated bacterial phytases reveals optimal biochemical properties and function in poultry gut condition.","authors":"Olyad Erba Urgessa, Ketema Tafess Tulu, Mesfin Tafesse Gemeda, Hunduma Dinka","doi":"10.1093/bioadv/vbaf256","DOIUrl":"10.1093/bioadv/vbaf256","url":null,"abstract":"<p><strong>Motivation: </strong>Insect guts may harbor phytase-producing bacteria applicable in poultry nutrition, but only <i>Serratia</i> sp. TN49 and its histidine acid phytase (AEQ29498.1) have been studied for this purpose. Therefore, AEQ29498.1 was used as a query to conduct a homology search for insect-associated bacterial phytases, followed by prediction of their structure and function. This <i>in silico</i> analysis of phytase may lead to the isolation of native phytase-producing bacteria from insect guts, potentially facilitating the production of desirable phytases for use in feed additives.</p><p><strong>Results: </strong>Twenty-six phytases from bacteria associated with the guts of black soldier fly larvae, fruit flies, and honey bees were identified. The mature chains of these phytases, except for the 4-phytase of <i>Bartocella apis</i> PEB0150, were predicted to carry a positive charge under the acidic conditions of the poultry upper gastrointestinal tract. They are stable (instability indices <40) and belong to histidine acid phosphatase family, which has been proven to be an effective poultry feed additive. The three-dimensional structure of the mature histidine-type phosphatase of <i>Tatumella</i> sp. JGM130 demonstrated the best quality and was found to be a homo-tetrameric protein. Molecular docking confirmed phytate binding at the catalytic motif of the histidine acid phosphatase family, RHGVRPP/AP/Q and HD.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf256"},"PeriodicalIF":2.8,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pathogenicity patterns in cytochrome P450 family. 细胞色素P450家族的致病性模式。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-14 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf231
Anna Špačková, Nina Kadášová, Ivana Hutařová Vařeková, Karel Berka

Motivation: Cytochrome P450 proteins play a crucial role in human metabolism, ranging from hormone production to drug metabolism. While multiple commonly known variants have known effects on the individual cytochrome P450 protein performance, the pathogenicity information is usually experimentally limited to only a few mutations. Current pathogenicity prediction software enables the extension of the scope to virtually mutate all amino acids with all possible substitutional mutations. In this work, we do a comprehensive exploration that unveils pathogenicity patterns in the human cytochrome P450 family. Pathogenicity analysis was conducted across proteins using SIFT, AlphaMissense, and PrimateAI-3D algorithms.

Results: Our findings indicate a progressive increase in pathogenicity along protein tunnels-identified via MOLE-toward the cofactor binding site, underscoring the essential role of cofactor interactions in enzymatic function. Notably, the integrity of tunnels and cofactor environment emerges as a critical factor, with even single amino acid alterations potentially disrupting molecular guidance to active sites. These insights highlight the fundamental role of structural pathways in preserving cytochrome P450 functionality, with implications for understanding disease-associated variants and drug metabolism.

Availability and implementation: Data and source code can be found at https://github.com/annaspac/P450_pathogenicity_codes.

动机:细胞色素P450蛋白在人体代谢中起着至关重要的作用,从激素产生到药物代谢。虽然多种已知的变异对单个细胞色素P450蛋白的性能有已知的影响,但其致病性信息通常在实验上仅限于少数突变。目前的致病性预测软件使范围的扩展,几乎突变所有的氨基酸与所有可能的替代突变。在这项工作中,我们做了一个全面的探索,揭示了人类细胞色素P450家族的致病性模式。使用SIFT、AlphaMissense和PrimateAI-3D算法对蛋白质进行致病性分析。结果:我们的研究结果表明,沿蛋白质通道(通过mole鉴定)向辅因子结合位点的致病性逐渐增加,强调了辅因子相互作用在酶功能中的重要作用。值得注意的是,通道和辅因子环境的完整性是一个关键因素,即使是单个氨基酸的改变也可能破坏分子对活性位点的引导。这些见解强调了结构通路在保持细胞色素P450功能中的基本作用,对理解疾病相关变异和药物代谢具有重要意义。可用性和实现:可以在https://github.com/annaspac/P450_pathogenicity_codes上找到数据和源代码。
{"title":"Pathogenicity patterns in cytochrome P450 family.","authors":"Anna Špačková, Nina Kadášová, Ivana Hutařová Vařeková, Karel Berka","doi":"10.1093/bioadv/vbaf231","DOIUrl":"10.1093/bioadv/vbaf231","url":null,"abstract":"<p><strong>Motivation: </strong>Cytochrome P450 proteins play a crucial role in human metabolism, ranging from hormone production to drug metabolism. While multiple commonly known variants have known effects on the individual cytochrome P450 protein performance, the pathogenicity information is usually experimentally limited to only a few mutations. Current pathogenicity prediction software enables the extension of the scope to virtually mutate all amino acids with all possible substitutional mutations. In this work, we do a comprehensive exploration that unveils pathogenicity patterns in the human cytochrome P450 family. Pathogenicity analysis was conducted across proteins using SIFT, AlphaMissense, and PrimateAI-3D algorithms.</p><p><strong>Results: </strong>Our findings indicate a progressive increase in pathogenicity along protein tunnels-identified via MOLE-toward the cofactor binding site, underscoring the essential role of cofactor interactions in enzymatic function. Notably, the integrity of tunnels and cofactor environment emerges as a critical factor, with even single amino acid alterations potentially disrupting molecular guidance to active sites. These insights highlight the fundamental role of structural pathways in preserving cytochrome P450 functionality, with implications for understanding disease-associated variants and drug metabolism.</p><p><strong>Availability and implementation: </strong>Data and source code can be found at https://github.com/annaspac/P450_pathogenicity_codes.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf231"},"PeriodicalIF":2.8,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12534787/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymmetric integration of various cancer datasets for identifying risk-associated variants and genes. 非对称整合各种癌症数据集,以识别风险相关的变异和基因。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-14 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf253
Ruixuan Wang, Lam Tran, Benjamin Brennan, Lars G Fritsche, Kevin He, J Chad Brenner, Hui Jiang

Motivation: Cancer genomic research provides an opportunity to identify cancer risk-associated genes, but often suffers from undesirable low statistical power due to a limited sample size. Integrated analysis with different cancers has the potential to enhance statistical power for identifying pan-cancer risk genes. However, substantial heterogeneity across various cancers makes this challenging.

Results: Recently, a novel asymmetric integration method was developed that can deal with data heterogeneity and exclude unhelpful datasets from the analysis. We adapted and applied this method to integrate genotype datasets with matched case and control individuals from the Michigan Genomics Initiative, using each cancer as the primary dataset of interest and the other cancers as auxiliary datasets, respectively. Conditional logistic regression models were coupled with the asymmetric integrated framework to handle the matched case-control study design and permutation tests were performed to control for false discovery rates (FDRs). At the same FDR level, the integrated analysis found more potential genetic variants and genes that are associated with the risks of various cancers, showcasing the promise of the proposed approach for integrated analysis of cancer datasets.

Availability and implementation: Our method is available as source code at https://github.com/rxxwang/integrate_cancer.

动机:癌症基因组研究提供了识别癌症风险相关基因的机会,但由于样本量有限,通常存在不理想的低统计能力。不同癌症的综合分析有可能提高识别泛癌症风险基因的统计能力。然而,各种癌症之间的巨大异质性使得这一研究具有挑战性。结果:近年来提出了一种新的非对称集成方法,该方法可以处理数据异质性,并从分析中排除无用的数据集。我们调整并应用该方法整合来自密歇根基因组计划的匹配病例和对照个体的基因型数据集,分别使用每种癌症作为感兴趣的主要数据集,其他癌症作为辅助数据集。条件逻辑回归模型与非对称集成框架相结合,以处理匹配的病例对照研究设计,并进行置换检验以控制错误发现率(FDRs)。在相同的FDR水平上,综合分析发现了更多与各种癌症风险相关的潜在遗传变异和基因,显示了拟议的癌症数据集综合分析方法的前景。可用性和实现:我们的方法的源代码可在https://github.com/rxxwang/integrate_cancer上获得。
{"title":"Asymmetric integration of various cancer datasets for identifying risk-associated variants and genes.","authors":"Ruixuan Wang, Lam Tran, Benjamin Brennan, Lars G Fritsche, Kevin He, J Chad Brenner, Hui Jiang","doi":"10.1093/bioadv/vbaf253","DOIUrl":"10.1093/bioadv/vbaf253","url":null,"abstract":"<p><strong>Motivation: </strong>Cancer genomic research provides an opportunity to identify cancer risk-associated genes, but often suffers from undesirable low statistical power due to a limited sample size. Integrated analysis with different cancers has the potential to enhance statistical power for identifying pan-cancer risk genes. However, substantial heterogeneity across various cancers makes this challenging.</p><p><strong>Results: </strong>Recently, a novel asymmetric integration method was developed that can deal with data heterogeneity and exclude unhelpful datasets from the analysis. We adapted and applied this method to integrate genotype datasets with matched case and control individuals from the Michigan Genomics Initiative, using each cancer as the primary dataset of interest and the other cancers as auxiliary datasets, respectively. Conditional logistic regression models were coupled with the asymmetric integrated framework to handle the matched case-control study design and permutation tests were performed to control for false discovery rates (FDRs). At the same FDR level, the integrated analysis found more potential genetic variants and genes that are associated with the risks of various cancers, showcasing the promise of the proposed approach for integrated analysis of cancer datasets.</p><p><strong>Availability and implementation: </strong>Our method is available as source code at https://github.com/rxxwang/integrate_cancer.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf253"},"PeriodicalIF":2.8,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12576323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtFun: a protein function prediction model using graph attention networks with a protein large language model. ProtFun:一个蛋白质功能预测模型,使用带有蛋白质大语言模型的图注意网络。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-11 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf245
Muhammed Talo, Serdar Bozdag

Motivation: Understanding protein functions facilitates the identification of the underlying causes of many diseases and guides the research for discovering new therapeutic targets and medications. With the advancement of high throughput technologies, obtaining novel protein sequences has been a routine process. However, determining protein functions experimentally is cost- and labor-prohibitive. Therefore, it is crucial to develop computational methods for automatic protein function prediction.

Results: In this study, we propose a multimodal deep learning architecture called ProtFun to predict protein functions. ProtFun integrates protein large language model embeddings as node features in a protein family network. Employing graph attention networks on this protein family network, ProtFun learns protein embeddings, which are integrated with protein signature representations from InterPro to train a protein function prediction model. We evaluated our architecture using three benchmark datasets. Our results showed that our proposed approach outperformed current state-of-the-art methods for most cases. An ablation study also highlighted the importance of different components of ProtFun.

Availability and implementation: The data and source code of ProtFun is available at https://github.com/bozdaglab/ProtFun under Creative Commons Attribution Non Commercial 4.0 International Public License.

动机:了解蛋白质功能有助于识别许多疾病的潜在原因,并指导发现新的治疗靶点和药物的研究。随着高通量技术的发展,获得新的蛋白质序列已成为一个常规过程。然而,通过实验来确定蛋白质的功能是成本和劳动力的限制。因此,开发蛋白质功能自动预测的计算方法至关重要。结果:在这项研究中,我们提出了一个名为ProtFun的多模态深度学习架构来预测蛋白质功能。ProtFun将蛋白质大语言模型嵌入作为蛋白质家族网络的节点特征。ProtFun在该蛋白质家族网络上使用图关注网络,学习蛋白质嵌入,并将其与InterPro的蛋白质签名表示相结合,以训练蛋白质功能预测模型。我们使用三个基准数据集评估我们的架构。我们的结果表明,我们提出的方法在大多数情况下优于当前最先进的方法。消融研究也强调了ProtFun不同组成部分的重要性。可用性和实现:ProtFun的数据和源代码可在https://github.com/bozdaglab/ProtFun上获得,遵循知识共享署名非商业4.0国际公共许可协议。
{"title":"ProtFun: a protein function prediction model using graph attention networks with a protein large language model.","authors":"Muhammed Talo, Serdar Bozdag","doi":"10.1093/bioadv/vbaf245","DOIUrl":"10.1093/bioadv/vbaf245","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding protein functions facilitates the identification of the underlying causes of many diseases and guides the research for discovering new therapeutic targets and medications. With the advancement of high throughput technologies, obtaining novel protein sequences has been a routine process. However, determining protein functions experimentally is cost- and labor-prohibitive. Therefore, it is crucial to develop computational methods for automatic protein function prediction.</p><p><strong>Results: </strong>In this study, we propose a multimodal deep learning architecture called ProtFun to predict protein functions. ProtFun integrates protein large language model embeddings as node features in a protein family network. Employing graph attention networks on this protein family network, ProtFun learns protein embeddings, which are integrated with protein signature representations from InterPro to train a protein function prediction model. We evaluated our architecture using three benchmark datasets. Our results showed that our proposed approach outperformed current state-of-the-art methods for most cases. An ablation study also highlighted the importance of different components of ProtFun.</p><p><strong>Availability and implementation: </strong>The data and source code of ProtFun is available at https://github.com/bozdaglab/ProtFun under Creative Commons Attribution Non Commercial 4.0 International Public License.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf245"},"PeriodicalIF":2.8,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12571506/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nextpie: a web-based reporting tool and database for reproducible nextflow pipelines. Nextpie:一个基于网络的报告工具和数据库,用于可复制的nextflow管道。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-10 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf252
Bishwa Ghimire, Nicholas Booth, Tapio Lönnberg, Tero Aittokallio

Motivation: High-throughput genomic data analysis consists of the inexorably intertwined inputs and outputs of a vast array of bioinformatic analysis tools. To guarantee streamlined and reproducible analyses, the often complex data analysis pipelines need to be run using workflow management tools. Nextflow is one popular tool commonly used to automate such pipelines. Nextflow records key pipeline data, such as the submission time, start time, completion time, CPU usage, memory usage, and disk usage for each task run. These data are stored in log files, often scattered across a file system. Therefore, aggregating information about resource usage critical for the optimization of Nextflow pipelines and improving reproducibility, as well as parsing and managing such log data, can quickly become cumbersome.

Results: Here, we present a web-based tool, Nextpie, which provides both a database and a reporting tool for Nextflow pipelines. Nextpie stores comprehensive resource usage information in a relational database, thus facilitating and accelerating the performance of a variety of data analyses and interactive visualizations, providing an easily comprehensible overview of a pipeline's resource usage.

Availability and implementation: The Nextpie source code, user documentation, an SQLite database with test data, and a Nextflow example pipeline are available at GitHub (https://github.com/bishwaG/Nextpie).

动机:高通量基因组数据分析由大量生物信息学分析工具不可避免地交织在一起的输入和输出组成。为了保证分析的流线型和可再现性,通常需要使用工作流管理工具来运行复杂的数据分析管道。Nextflow是一种常用的自动化管道工具。Nextflow记录每个任务运行时的关键管道数据,如提交时间、开始时间、完成时间、CPU使用情况、内存使用情况和磁盘使用情况。这些数据存储在日志文件中,通常分散在文件系统中。因此,对Nextflow管道优化和提高可重复性至关重要的资源使用信息的聚合,以及对此类日志数据的解析和管理,很快就会变得很麻烦。结果:在这里,我们提出了一个基于网络的工具Nextpie,它为Nextflow管道提供了数据库和报告工具。Nextpie在关系数据库中存储了全面的资源使用信息,从而促进和加速了各种数据分析和交互式可视化的性能,提供了一个易于理解的管道资源使用概况。可用性和实现:Nextpie源代码、用户文档、带有测试数据的SQLite数据库和Nextflow示例管道可在GitHub (https://github.com/bishwaG/Nextpie)获得。
{"title":"Nextpie: a web-based reporting tool and database for reproducible nextflow pipelines.","authors":"Bishwa Ghimire, Nicholas Booth, Tapio Lönnberg, Tero Aittokallio","doi":"10.1093/bioadv/vbaf252","DOIUrl":"10.1093/bioadv/vbaf252","url":null,"abstract":"<p><strong>Motivation: </strong>High-throughput genomic data analysis consists of the inexorably intertwined inputs and outputs of a vast array of bioinformatic analysis tools. To guarantee streamlined and reproducible analyses, the often complex data analysis pipelines need to be run using workflow management tools. Nextflow is one popular tool commonly used to automate such pipelines. Nextflow records key pipeline data, such as the submission time, start time, completion time, CPU usage, memory usage, and disk usage for each task run. These data are stored in log files, often scattered across a file system. Therefore, aggregating information about resource usage critical for the optimization of Nextflow pipelines and improving reproducibility, as well as parsing and managing such log data, can quickly become cumbersome.</p><p><strong>Results: </strong>Here, we present a web-based tool, Nextpie, which provides both a database and a reporting tool for Nextflow pipelines. Nextpie stores comprehensive resource usage information in a relational database, thus facilitating and accelerating the performance of a variety of data analyses and interactive visualizations, providing an easily comprehensible overview of a pipeline's resource usage.</p><p><strong>Availability and implementation: </strong>The Nextpie source code, user documentation, an SQLite database with test data, and a Nextflow example pipeline are available at GitHub (https://github.com/bishwaG/Nextpie).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf252"},"PeriodicalIF":2.8,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12574971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeoGenIE: a deep learning approach to predict geographic provenance of biodiversity samples from genomic SNPs. GeoGenIE:一种深度学习方法,用于从基因组snp中预测生物多样性样本的地理来源。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-09 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf250
Bradley T Martin, Zachery D Zbinden, Michael E Douglas, Marlis R Douglas, Tyler K Chafin

Motivation: Determining geographic origin of samples is a common objective in wildlife management, forensics, and conservation. Current methods often assume evolutionary models or require extensive reference datasets, which are costly and difficult to develop, that perform poorly with uneven or biased sampling. Supervised deep learning offers a promising alternative by learning complex patterns without prior model specifications. Combined with novel geo-genetic data augmentation and preprocessing techniques, it can reduce reference panel demands and improve performance across diverse sampling schemes, broadening accurate provenance determination to more study systems.

Results: We present GeoGenIE, an open-source software package powered by PyTorch for geographic provenance prediction from genomic data. GeoGenIE implements a multilayer perceptron architecture within an automated hyperparameter tuning framework, incorporating preprocessing, geo-genetic outlier detection, and data augmentation to improve accuracy in sparsely sampled regions. Benchmarking against a comparable approach with White-tailed deer (Odocoileus virginianus) double digest restriction-site associated DNA sequencing data, GeoGenIE achieved substantially improved geolocation accuracy with less spatial bias using a smaller SNP panel. Gains were most evident in undersampled regions, underscoring effectiveness under challenging conditions. Its parallelized execution also produced fast runtimes, promoting its application to large datasets.

Availability and implementation: Open-source at https://github.com/btmartin721/geogenie and https://pypi.org/project/GeoGenIE/.

动机:确定样本的地理来源是野生动物管理、法医学和保护的共同目标。目前的方法通常假设进化模型或需要广泛的参考数据集,这些数据集成本高昂且难以开发,并且在不均匀或有偏差的采样中表现不佳。监督深度学习提供了一种很有前途的替代方案,即在没有事先模型规范的情况下学习复杂模式。结合新的地球成因数据增强和预处理技术,可以减少参考面板的需求,提高不同采样方案的性能,将准确的物源确定扩展到更多的研究系统。结果:我们提出了GeoGenIE,一个由PyTorch驱动的开源软件包,用于从基因组数据中预测地理来源。GeoGenIE在自动超参数调优框架中实现了多层感知器架构,结合了预处理、地源异常值检测和数据增强,以提高稀疏采样区域的准确性。与白尾鹿(Odocoileus virginianus)双消化限制性位点相关DNA测序数据的类似方法相比,GeoGenIE使用较小的SNP面板实现了显著提高的地理定位精度和较少的空间偏差。收益在样本不足的地区最为明显,强调了在具有挑战性的条件下的有效性。它的并行执行也产生了快速的运行时间,将其应用于大型数据集。可用性和实现:在https://github.com/btmartin721/geogenie和https://pypi.org/project/GeoGenIE/上开放源代码。
{"title":"GeoGenIE: a deep learning approach to predict geographic provenance of biodiversity samples from genomic SNPs.","authors":"Bradley T Martin, Zachery D Zbinden, Michael E Douglas, Marlis R Douglas, Tyler K Chafin","doi":"10.1093/bioadv/vbaf250","DOIUrl":"10.1093/bioadv/vbaf250","url":null,"abstract":"<p><strong>Motivation: </strong>Determining geographic origin of samples is a common objective in wildlife management, forensics, and conservation. Current methods often assume evolutionary models or require extensive reference datasets, which are costly and difficult to develop, that perform poorly with uneven or biased sampling. Supervised deep learning offers a promising alternative by learning complex patterns without prior model specifications. Combined with novel geo-genetic data augmentation and preprocessing techniques, it can reduce reference panel demands and improve performance across diverse sampling schemes, broadening accurate provenance determination to more study systems.</p><p><strong>Results: </strong>We present GeoGenIE, an open-source software package powered by PyTorch for geographic provenance prediction from genomic data. GeoGenIE implements a multilayer perceptron architecture within an automated hyperparameter tuning framework, incorporating preprocessing, geo-genetic outlier detection, and data augmentation to improve accuracy in sparsely sampled regions. Benchmarking against a comparable approach with White-tailed deer (<i>Odocoileus virginianus</i>) double digest restriction-site associated DNA sequencing data, GeoGenIE achieved substantially improved geolocation accuracy with less spatial bias using a smaller SNP panel. Gains were most evident in undersampled regions, underscoring effectiveness under challenging conditions. Its parallelized execution also produced fast runtimes, promoting its application to large datasets.</p><p><strong>Availability and implementation: </strong>Open-source at https://github.com/btmartin721/geogenie and https://pypi.org/project/GeoGenIE/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf250"},"PeriodicalIF":2.8,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596584/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145491082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OncotreeVIS-an interactive graphical user interface for visualizing mutation tree cohorts. oncotreevis是一个用于可视化突变树队列的交互式图形用户界面。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-09 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf247
Monica-Andreea Baciu-Drăgan, Niko Beerenwinkel

Summary: In recent years, developments in single-cell next-generation sequencing technology and computational methodology have made it possible to reconstruct, with increasing precision, the evolutionary history of tumors and their cell phylogenies, represented as mutation trees. Many mutation tree inference tools exist, but they do not support detailed visual tree inspection, nor tree comparisons or analysis at the cohort level, an important task in computational oncology. We developed oncotreeVIS, an interactive graphical user interface for visualizing mutation tree cohorts and tree posterior distributions obtained from mutation tree inference tools. OncotreeVIS can display mutation trees that encode single or joint genetic events, such as point mutations and copy number changes, and highlight matching subclones, conserved trajectories and drug-gene interactions at the cohort level. OncotreeVIS facilitates the visual inspection of mutation tree clusters and pairwise tree distances. It is available both as a JavaScript library that can be used locally or as a web application that can be accessed online. It includes seven default datasets of public mutation tree cohorts for visualization, while new mutation trees are provided in a predefined JSON format.

Availability and implementation: https://cbg-ethz.github.io/oncotreeVIS.

摘要:近年来,单细胞下一代测序技术和计算方法的发展使得重建肿瘤及其细胞系统发育的进化史成为可能,其精度越来越高,以突变树的形式表示。存在许多突变树推断工具,但它们不支持详细的视觉树检查,也不支持队列水平的树比较或分析,这是计算肿瘤学的一项重要任务。我们开发了oncotreeVIS,这是一个交互式图形用户界面,用于可视化突变树队列和从突变树推断工具获得的树后验分布。OncotreeVIS可以显示编码单个或联合遗传事件的突变树,如点突变和拷贝数变化,并在队列水平上突出匹配的亚克隆、保守轨迹和药物-基因相互作用。OncotreeVIS有助于突变树簇和成对树距离的目视检查。它既可以作为本地使用的JavaScript库,也可以作为可以在线访问的web应用程序。它包括用于可视化的公共突变树队列的七个默认数据集,同时以预定义的JSON格式提供新的突变树。可用性和实现:https://cbg-ethz.github.io/oncotreeVIS。
{"title":"OncotreeVIS-an interactive graphical user interface for visualizing mutation tree cohorts.","authors":"Monica-Andreea Baciu-Drăgan, Niko Beerenwinkel","doi":"10.1093/bioadv/vbaf247","DOIUrl":"10.1093/bioadv/vbaf247","url":null,"abstract":"<p><strong>Summary: </strong>In recent years, developments in single-cell next-generation sequencing technology and computational methodology have made it possible to reconstruct, with increasing precision, the evolutionary history of tumors and their cell phylogenies, represented as mutation trees. Many mutation tree inference tools exist, but they do not support detailed visual tree inspection, nor tree comparisons or analysis at the cohort level, an important task in computational oncology. We developed oncotreeVIS, an interactive graphical user interface for visualizing mutation tree cohorts and tree posterior distributions obtained from mutation tree inference tools. OncotreeVIS can display mutation trees that encode single or joint genetic events, such as point mutations and copy number changes, and highlight matching subclones, conserved trajectories and drug-gene interactions at the cohort level. OncotreeVIS facilitates the visual inspection of mutation tree clusters and pairwise tree distances. It is available both as a JavaScript library that can be used locally or as a web application that can be accessed online. It includes seven default datasets of public mutation tree cohorts for visualization, while new mutation trees are provided in a predefined JSON format.</p><p><strong>Availability and implementation: </strong>https://cbg-ethz.github.io/oncotreeVIS.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf247"},"PeriodicalIF":2.8,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596139/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1