首页 > 最新文献

BMC Bioinformatics最新文献

英文 中文
A multi-stage weakly supervised design for spheroid segmentation to explore mesenchymal stem cell differentiation dynamics. 一个多阶段弱监督设计的球体分割探索间充质干细胞分化动力学。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-17 DOI: 10.1186/s12859-024-06031-x
Arash Shahbazpoor Shahbazi, Farzin Irandoost, Reza Mahdavian, Seyedehsamaneh Shojaeilangari, Abdollah Allahvardi, Hossein Naderi-Manesh
{"title":"A multi-stage weakly supervised design for spheroid segmentation to explore mesenchymal stem cell differentiation dynamics.","authors":"Arash Shahbazpoor Shahbazi, Farzin Irandoost, Reza Mahdavian, Seyedehsamaneh Shojaeilangari, Abdollah Allahvardi, Hossein Naderi-Manesh","doi":"10.1186/s12859-024-06031-x","DOIUrl":"https://doi.org/10.1186/s12859-024-06031-x","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"20"},"PeriodicalIF":2.9,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742216/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TopoQual polishes circular consensus sequencing data and accurately predicts quality scores. TopoQual抛光循环共识测序数据,并准确预测质量分数。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-16 DOI: 10.1186/s12859-024-06020-0
Minindu Weerakoon, Sangjin Lee, Emily Mitchell, Haynes Heaton

Background: Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads.

Results: To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data.

Conclusion: TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.

背景:太平洋生物科学公司(PacBio)循环共识测序(CCS),也被称为高保真度(HiFi)技术,通过产生长(10 + kb)和高度精确的读取,彻底改变了现代基因组学。这是通过对环状DNA分子进行多次测序并将它们组合成一个一致的序列来实现的。目前,HiFi技术提供的精度和质量值估计对于基因组组装和种系变异召唤等应用已经绰绰绰用。然而,当涉及到单个读取的体细胞变异呼叫时,估计质量分数的准确性存在局限性。结果:为了解决体细胞变异呼叫质量评分不准确的挑战,我们引入了TopoQual,一个旨在提高基础质量预测准确性的新工具。TopoQual利用包括偏序对齐(POA)、拓扑并行碱基和深度学习算法在内的技术来优化共识序列。我们的研究结果表明,TopoQual纠正了PacBio共识序列中大约31.9%的错误。此外,它验证了高达q59的碱基质量,这相当于90万个碱基中的一个错误。这些改进将显著提高使用HiFi数据进行体细胞变异呼叫的可靠性。结论:TopoQual通过提高PacBio HiFi测序数据的碱基质量预测的准确性,代表了基因组学的重大进步。通过纠正相当大比例的错误并实现高基础质量验证,TopoQual使自信和准确的体细胞变异呼叫成为可能。该工具不仅解决了当前HiFi技术的一个关键限制,而且为各种研究和临床应用中的精确基因组分析开辟了新的可能性。
{"title":"TopoQual polishes circular consensus sequencing data and accurately predicts quality scores.","authors":"Minindu Weerakoon, Sangjin Lee, Emily Mitchell, Haynes Heaton","doi":"10.1186/s12859-024-06020-0","DOIUrl":"10.1186/s12859-024-06020-0","url":null,"abstract":"<p><strong>Background: </strong>Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads.</p><p><strong>Results: </strong>To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data.</p><p><strong>Conclusion: </strong>TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"17"},"PeriodicalIF":2.9,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zim4rv: an R package to modeling zero-inflated count phenotype on regional-based rare variants. Zim4rv:一个R包建模零膨胀计数表型上基于区域的罕见变异。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-16 DOI: 10.1186/s12859-024-06029-5
Xiaomin Liu, Yi-Ju Li, Qiao Fan

Background: With the advance of next-generation sequencing, various gene-based rare variant association tests have been developed, particularly for binary and continuous phenotypes. In contrast, fewer methods are available for traits not following binomial or normal distributions. To address this, we previously proposed a set of burden- and kernel-based rare variant tests for count data following zero-inflated Poisson (ZIP) distributions, referred to as ZIP-b and ZIP-k tests. We sought to extend the methods to accommodate negative binomial distribution and implemented these tests in a new R package.

Results: We introduce ZIM4rv, an R package designed to analyze the association of rare variants with zero-inflated counts outcomes. Our package offers two novel models developed by our team: our previously proposed ZIP-b and ZIP-k tests, and the newly derived Negative Binomial Burden and Kernel Test (ZINB-b, ZINB-k). Additionally, we include an ad-hoc two-stage analysis, testing zero and non-zero as a binary outcome and non-zero as a continuous outcome, respectively. To showcase the utility of our platform, we applied this program to analyze neuritic plaque count data from the ROSMAP cohort.

Conclusion: The R package ZIM4rv presents an integrated workflow for conducting association tests on a set of rare variants with zero-inflated counts data.

背景:随着下一代测序技术的进步,各种基于基因的罕见变异关联检测已经开发出来,特别是针对二元和连续表型。相比之下,对于不遵循二项分布或正态分布的性状,可用的方法较少。为了解决这个问题,我们之前提出了一组基于负担和内核的稀有变体测试,用于遵循零膨胀泊松(ZIP)分布的计数数据,称为ZIP-b和ZIP-k测试。我们试图扩展方法以适应负二项分布,并在一个新的R包中实现这些测试。结果:我们引入了ZIM4rv,这是一个R软件包,旨在分析罕见变异与零膨胀计数结果的关联。我们的软件包提供了我们团队开发的两个新模型:我们之前提出的ZIP-b和ZIP-k测试,以及新导出的负二项负担和内核测试(ZINB-b, ZINB-k)。此外,我们还包括一个特设的两阶段分析,分别测试零和非零作为二进制结果和非零作为连续结果。为了展示我们平台的实用性,我们应用该程序分析来自ROSMAP队列的神经斑块计数数据。结论:R软件包ZIM4rv提供了一个集成的工作流程,用于对一组具有零膨胀计数数据的罕见变异进行关联测试。
{"title":"Zim4rv: an R package to modeling zero-inflated count phenotype on regional-based rare variants.","authors":"Xiaomin Liu, Yi-Ju Li, Qiao Fan","doi":"10.1186/s12859-024-06029-5","DOIUrl":"10.1186/s12859-024-06029-5","url":null,"abstract":"<p><strong>Background: </strong>With the advance of next-generation sequencing, various gene-based rare variant association tests have been developed, particularly for binary and continuous phenotypes. In contrast, fewer methods are available for traits not following binomial or normal distributions. To address this, we previously proposed a set of burden- and kernel-based rare variant tests for count data following zero-inflated Poisson (ZIP) distributions, referred to as ZIP-b and ZIP-k tests. We sought to extend the methods to accommodate negative binomial distribution and implemented these tests in a new R package.</p><p><strong>Results: </strong>We introduce ZIM4rv, an R package designed to analyze the association of rare variants with zero-inflated counts outcomes. Our package offers two novel models developed by our team: our previously proposed ZIP-b and ZIP-k tests, and the newly derived Negative Binomial Burden and Kernel Test (ZINB-b, ZINB-k). Additionally, we include an ad-hoc two-stage analysis, testing zero and non-zero as a binary outcome and non-zero as a continuous outcome, respectively. To showcase the utility of our platform, we applied this program to analyze neuritic plaque count data from the ROSMAP cohort.</p><p><strong>Conclusion: </strong>The R package ZIM4rv presents an integrated workflow for conducting association tests on a set of rare variants with zero-inflated counts data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"18"},"PeriodicalIF":2.9,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11740424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting drug combination side effects based on a metapath-based heterogeneous graph neural network. 基于元路径的异质图神经网络预测药物联合副作用。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-15 DOI: 10.1186/s12859-024-06028-6
Leixia Tian, Qi Wang, Zhiheng Zhou, Xiya Liu, Ming Zhang, Guiying Yan

In recent years, combined drug screening has played a very important role in modern drug discovery. Generally, synergistic drug combinations are crucial in treatment for many diseases. However, the toxic side effects of drug combinations are probably increased with the increase of drugs numbers, so the accurate prediction of toxic side effects of drug combinations is equally important. In this paper, we built a Metapath-based Aggregated Embedding Model on Single Drug-Side Effect Heterogeneous Information Network (MAEM-SSHIN), which extracts feature from a heterogeneous information network of single drug side effects, and a Graph Convolutional Network on Combinatorial drugs and Side effect Heterogeneous Information Network (GCN-CSHIN), which transforms the complex task of predicting multiple side effects between drug pairs into the more manageable prediction of relationships between combinatorial drugs and individual side effects. MAEM-SSHIN and GCN-CSHIN provided a united novel framework for predicting potential side effects in combinatorial drug therapies. This integration enhances prediction accuracy, efficiency, and scalability. Our experimental results demonstrate that this combined framework outperforms existing methodologies in predicting side effects, and marks a significant advancement in pharmaceutical research.

近年来,联合药物筛选在现代药物发现中起着非常重要的作用。一般来说,协同药物组合在许多疾病的治疗中至关重要。然而,药物组合的毒副作用可能随着药物数量的增加而增加,因此准确预测药物组合的毒副作用同样重要。本文构建了基于元路径的单一药物副作用异构信息网络聚合嵌入模型(MAEM-SSHIN),该模型从单一药物副作用异构信息网络中提取特征;构建了组合药物副作用异构信息网络图卷积网络(GCN-CSHIN)。它将预测药物对之间多种副作用的复杂任务转变为更易于管理的预测组合药物和个体副作用之间的关系。MAEM-SSHIN和GCN-CSHIN为预测联合药物治疗的潜在副作用提供了一个统一的新框架。这种集成提高了预测的准确性、效率和可伸缩性。我们的实验结果表明,这种组合框架在预测副作用方面优于现有的方法,并标志着药物研究的重大进步。
{"title":"Predicting drug combination side effects based on a metapath-based heterogeneous graph neural network.","authors":"Leixia Tian, Qi Wang, Zhiheng Zhou, Xiya Liu, Ming Zhang, Guiying Yan","doi":"10.1186/s12859-024-06028-6","DOIUrl":"10.1186/s12859-024-06028-6","url":null,"abstract":"<p><p>In recent years, combined drug screening has played a very important role in modern drug discovery. Generally, synergistic drug combinations are crucial in treatment for many diseases. However, the toxic side effects of drug combinations are probably increased with the increase of drugs numbers, so the accurate prediction of toxic side effects of drug combinations is equally important. In this paper, we built a Metapath-based Aggregated Embedding Model on Single Drug-Side Effect Heterogeneous Information Network (MAEM-SSHIN), which extracts feature from a heterogeneous information network of single drug side effects, and a Graph Convolutional Network on Combinatorial drugs and Side effect Heterogeneous Information Network (GCN-CSHIN), which transforms the complex task of predicting multiple side effects between drug pairs into the more manageable prediction of relationships between combinatorial drugs and individual side effects. MAEM-SSHIN and GCN-CSHIN provided a united novel framework for predicting potential side effects in combinatorial drug therapies. This integration enhances prediction accuracy, efficiency, and scalability. Our experimental results demonstrate that this combined framework outperforms existing methodologies in predicting side effects, and marks a significant advancement in pharmaceutical research.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"16"},"PeriodicalIF":2.9,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142999455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of a TSR-based method for understanding structural relationships of cofactors and local environments in photosystem I. 基于tsr的光系统辅助因子与局部环境结构关系研究进展
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-14 DOI: 10.1186/s12859-025-06038-y
Lujun Luo, Tarikul I Milon, Elijah K Tandoh, Walter J Galdamez, Andrei Y Chistoserdov, Jianping Yu, Jan Kern, Yingchun Wang, Wu Xu

Background: All chemical forms of energy and oxygen on Earth are generated via photosynthesis where light energy is converted into redox energy by two photosystems (PS I and PS II). There is an increasing number of PS I 3D structures deposited in the Protein Data Bank (PDB). The Triangular Spatial Relationship (TSR)-based algorithm converts 3D structures into integers (TSR keys). A comprehensive study was conducted, by taking advantage of the PS I 3D structures and the TSR-based algorithm, to answer three questions: (i) Are electron cofactors including P700, A-1 and A0, which are chemically identical chlorophylls, structurally different? (ii) There are two electron transfer chains (A and B branches) in PS I. Are the cofactors on both branches structurally different? (iii) Are the amino acids in cofactor binding sites structurally different from those not in cofactor binding sites?

Results: The key contributions and important findings include: (i) a novel TSR-based method for representing 3D structures of pigments as well as for quantifying pigment structures was developed; (ii) the results revealed that the redox cofactor, P700, are structurally conserved and different from other redox factors. Similar situations were also observed for both A-1 and A0; (iii) the results demonstrated structural differences between A and B branches for the redox cofactors P700, A-1, A0 and A1 as well as their cofactor binding sites; (iv) the tryptophan residues close to A0 and A1 are structurally conserved; (v) The TSR-based method outperforms the Root Mean Square Deviation (RMSD) and the Ultrafast Shape Recognition (USR) methods.

Conclusions: The structural analyses of redox cofactors and their binding sites provide a foundation for understanding the unique chemical and physical properties of each redox cofactor in PS I, which are essential for modulating the rate and direction of energy and electron transfers.

背景:地球上所有化学形式的能量和氧气都是通过光合作用产生的,其中光能通过两个光系统(PS I和PS II)转化为氧化还原能。蛋白质数据库(PDB)中储存的PS I 3D结构越来越多。基于三角空间关系(TSR)的算法将三维结构转换为整数(TSR键)。利用PS I的三维结构和基于tsr的算法进行了全面的研究,回答了三个问题:(I)电子辅助因子包括P700, A-1和A0,它们是化学上相同的叶绿素,在结构上是否不同?(ii) PS i中有两个电子传递链(A支和B支),两个支上的辅因子在结构上是否不同?(iii)辅因子结合位点上的氨基酸与非辅因子结合位点上的氨基酸在结构上是否不同?结果:主要贡献和重要发现包括:(i)开发了一种新的基于tsr的颜料三维结构表征方法和定量颜料结构的方法;(ii)结果显示,氧化还原辅因子P700在结构上是保守的,与其他氧化还原因子不同。A-1和A0也观察到类似的情况;(iii)结果表明,氧化还原辅助因子P700、A-1、A0和A1及其辅助因子结合位点在A和B分支之间存在结构差异;(iv)靠近A0和A1的色氨酸残基在结构上是保守的;(v)基于tsr的方法优于均方根偏差(RMSD)和超快速形状识别(USR)方法。结论:氧化还原辅助因子及其结合位点的结构分析为了解PS I中每个氧化还原辅助因子独特的化学和物理性质提供了基础,这些性质对调节能量和电子转移的速率和方向至关重要。
{"title":"Development of a TSR-based method for understanding structural relationships of cofactors and local environments in photosystem I.","authors":"Lujun Luo, Tarikul I Milon, Elijah K Tandoh, Walter J Galdamez, Andrei Y Chistoserdov, Jianping Yu, Jan Kern, Yingchun Wang, Wu Xu","doi":"10.1186/s12859-025-06038-y","DOIUrl":"10.1186/s12859-025-06038-y","url":null,"abstract":"<p><strong>Background: </strong>All chemical forms of energy and oxygen on Earth are generated via photosynthesis where light energy is converted into redox energy by two photosystems (PS I and PS II). There is an increasing number of PS I 3D structures deposited in the Protein Data Bank (PDB). The Triangular Spatial Relationship (TSR)-based algorithm converts 3D structures into integers (TSR keys). A comprehensive study was conducted, by taking advantage of the PS I 3D structures and the TSR-based algorithm, to answer three questions: (i) Are electron cofactors including P700, A<sub>-1</sub> and A<sub>0</sub>, which are chemically identical chlorophylls, structurally different? (ii) There are two electron transfer chains (A and B branches) in PS I. Are the cofactors on both branches structurally different? (iii) Are the amino acids in cofactor binding sites structurally different from those not in cofactor binding sites?</p><p><strong>Results: </strong>The key contributions and important findings include: (i) a novel TSR-based method for representing 3D structures of pigments as well as for quantifying pigment structures was developed; (ii) the results revealed that the redox cofactor, P700, are structurally conserved and different from other redox factors. Similar situations were also observed for both A<sub>-1</sub> and A<sub>0</sub>; (iii) the results demonstrated structural differences between A and B branches for the redox cofactors P700, A<sub>-1</sub>, A<sub>0</sub> and A<sub>1</sub> as well as their cofactor binding sites; (iv) the tryptophan residues close to A<sub>0</sub> and A<sub>1</sub> are structurally conserved; (v) The TSR-based method outperforms the Root Mean Square Deviation (RMSD) and the Ultrafast Shape Recognition (USR) methods.</p><p><strong>Conclusions: </strong>The structural analyses of redox cofactors and their binding sites provide a foundation for understanding the unique chemical and physical properties of each redox cofactor in PS I, which are essential for modulating the rate and direction of energy and electron transfers.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"15"},"PeriodicalIF":2.9,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DTI-MHAPR: optimized drug-target interaction prediction via PCA-enhanced features and heterogeneous graph attention networks. DTI-MHAPR:通过pca增强特征和异构图注意网络优化药物-靶标相互作用预测。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-13 DOI: 10.1186/s12859-024-06021-z
Guang Yang, Yinbo Liu, Sijian Wen, Wenxi Chen, Xiaolei Zhu, Yongmei Wang

Drug-target interactions (DTIs) are pivotal in drug discovery and development, and their accurate identification can significantly expedite the process. Numerous DTI prediction methods have emerged, yet many fail to fully harness the feature information of drugs and targets or address the issue of feature redundancy. We aim to refine DTI prediction accuracy by eliminating redundant features and capitalizing on the node topological structure to enhance feature extraction. To achieve this, we introduce a PCA-augmented multi-layer heterogeneous graph-based network that concentrates on key features throughout the encoding-decoding phase. Our approach initiates with the construction of a heterogeneous graph from various similarity metrics, which is then encoded via a graph neural network. We concatenate and integrate the resultant representation vectors to merge multi-level information. Subsequently, principal component analysis is applied to distill the most informative features, with the random forest algorithm employed for the final decoding of the integrated data. Our method outperforms six baseline models in terms of accuracy, as demonstrated by extensive experimentation. Comprehensive ablation studies, visualization of results, and in-depth case analyses further validate our framework's efficacy and interpretability, providing a novel tool for drug discovery that integrates multimodal features.

药物-靶标相互作用(DTIs)是药物发现和开发的关键,它们的准确识别可以显著加快这一过程。目前已经出现了许多DTI预测方法,但许多方法未能充分利用药物和靶标的特征信息或解决特征冗余问题。我们的目标是通过消除冗余特征和利用节点拓扑结构来增强特征提取来提高DTI预测精度。为了实现这一目标,我们引入了一种pca增强的多层异构基于图的网络,该网络专注于整个编码解码阶段的关键特征。我们的方法首先从各种相似度指标构建异构图,然后通过图神经网络对其进行编码。我们将生成的表示向量进行连接和整合,以合并多层次的信息。随后,应用主成分分析提取信息量最大的特征,并采用随机森林算法对综合数据进行最终解码。我们的方法在准确性方面优于六个基线模型,正如广泛的实验所证明的那样。综合消融研究、结果可视化和深入的案例分析进一步验证了我们的框架的有效性和可解释性,为整合多模式特征的药物发现提供了一种新的工具。
{"title":"DTI-MHAPR: optimized drug-target interaction prediction via PCA-enhanced features and heterogeneous graph attention networks.","authors":"Guang Yang, Yinbo Liu, Sijian Wen, Wenxi Chen, Xiaolei Zhu, Yongmei Wang","doi":"10.1186/s12859-024-06021-z","DOIUrl":"10.1186/s12859-024-06021-z","url":null,"abstract":"<p><p>Drug-target interactions (DTIs) are pivotal in drug discovery and development, and their accurate identification can significantly expedite the process. Numerous DTI prediction methods have emerged, yet many fail to fully harness the feature information of drugs and targets or address the issue of feature redundancy. We aim to refine DTI prediction accuracy by eliminating redundant features and capitalizing on the node topological structure to enhance feature extraction. To achieve this, we introduce a PCA-augmented multi-layer heterogeneous graph-based network that concentrates on key features throughout the encoding-decoding phase. Our approach initiates with the construction of a heterogeneous graph from various similarity metrics, which is then encoded via a graph neural network. We concatenate and integrate the resultant representation vectors to merge multi-level information. Subsequently, principal component analysis is applied to distill the most informative features, with the random forest algorithm employed for the final decoding of the integrated data. Our method outperforms six baseline models in terms of accuracy, as demonstrated by extensive experimentation. Comprehensive ablation studies, visualization of results, and in-depth case analyses further validate our framework's efficacy and interpretability, providing a novel tool for drug discovery that integrates multimodal features.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"11"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11726937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CompàreGenome: a command-line tool for genomic diversity estimation in prokaryotes and eukaryotes. CompàreGenome:原核生物和真核生物基因组多样性估计的命令行工具。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-13 DOI: 10.1186/s12859-025-06036-0
Gabriele Moro, Rossano Atzeni, Ali Al-Subhi, Maria Giovanna Marche

Background: The increasing availability of sequenced genomes has enabled comparative analyses of various organisms. Numerous tools and online platforms have been developed for this purpose, facilitating the identification of unique features within selected organisms. However, choosing the most appropriate tools can be unclear during the initial stages of analysis, often requiring multiple attempts to match the specific characteristics of the data. Here, we introduce CompàreGenome, a command-line tool specifically designed for genomic diversity estimation analyses. Suitable for both prokaryotes and eukaryotes, this tool is particularly valuable in the early stages of studies when little information is available about the genetic differences or similarities among compared organisms.

Results: In all the tests conducted, CompàreGenome successfully identified specific genetic features of the selected organisms, detected the most conserved genes, pinpointed highly divergent ones, and functionally annotated these genes. This provided insights into biological processes, molecular functions, and cellular components associated with each gene. The tool also distinguished organisms at the strain level and quantified genetic distances using three distinct analytical methods.

Conclusion: CompàreGenome empowers users to explore genomic differences among organisms, translating technical outputs from various tools into actionable insights for biologists. While primarily tested on small microbial genomes, the tool has potential applications for larger genomes. CompàreGenome is implemented in Bash, R, and Python and is freely available under an LGPL-2.1 license.

背景:越来越多的基因组测序使各种生物的比较分析成为可能。为此目的开发了许多工具和在线平台,促进了选定生物体内独特特征的识别。然而,在分析的初始阶段,选择最合适的工具可能是不明确的,通常需要多次尝试来匹配数据的特定特征。在这里,我们介绍CompàreGenome,一个专门为基因组多样性估计分析设计的命令行工具。该工具既适用于原核生物,也适用于真核生物,在研究的早期阶段,当比较生物之间的遗传差异或相似信息很少时,该工具特别有价值。结果:在进行的所有测试中,CompàreGenome成功地确定了选定生物体的特定遗传特征,检测了最保守的基因,确定了高度分化的基因,并对这些基因进行了功能注释。这提供了对与每个基因相关的生物过程、分子功能和细胞成分的见解。该工具还在菌株水平上区分生物体,并使用三种不同的分析方法量化遗传距离。结论:CompàreGenome使用户能够探索生物体之间的基因组差异,将各种工具的技术产出转化为生物学家可操作的见解。虽然该工具主要在小型微生物基因组上进行测试,但它有可能应用于更大的基因组。CompàreGenome是用Bash、R和Python实现的,在LGPL-2.1许可下免费提供。
{"title":"CompàreGenome: a command-line tool for genomic diversity estimation in prokaryotes and eukaryotes.","authors":"Gabriele Moro, Rossano Atzeni, Ali Al-Subhi, Maria Giovanna Marche","doi":"10.1186/s12859-025-06036-0","DOIUrl":"10.1186/s12859-025-06036-0","url":null,"abstract":"<p><strong>Background: </strong>The increasing availability of sequenced genomes has enabled comparative analyses of various organisms. Numerous tools and online platforms have been developed for this purpose, facilitating the identification of unique features within selected organisms. However, choosing the most appropriate tools can be unclear during the initial stages of analysis, often requiring multiple attempts to match the specific characteristics of the data. Here, we introduce CompàreGenome, a command-line tool specifically designed for genomic diversity estimation analyses. Suitable for both prokaryotes and eukaryotes, this tool is particularly valuable in the early stages of studies when little information is available about the genetic differences or similarities among compared organisms.</p><p><strong>Results: </strong>In all the tests conducted, CompàreGenome successfully identified specific genetic features of the selected organisms, detected the most conserved genes, pinpointed highly divergent ones, and functionally annotated these genes. This provided insights into biological processes, molecular functions, and cellular components associated with each gene. The tool also distinguished organisms at the strain level and quantified genetic distances using three distinct analytical methods.</p><p><strong>Conclusion: </strong>CompàreGenome empowers users to explore genomic differences among organisms, translating technical outputs from various tools into actionable insights for biologists. While primarily tested on small microbial genomes, the tool has potential applications for larger genomes. CompàreGenome is implemented in Bash, R, and Python and is freely available under an LGPL-2.1 license.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"14"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solu: a cloud platform for real-time genomic pathogen surveillance. Solu:实时基因组病原体监测云平台。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-13 DOI: 10.1186/s12859-024-06005-z
Timo Saratto, Kerkko Visuri, Jonatan Lehtinen, Irene Ortega-Sanz, Jacob L Steenwyk, Samuel Sihvonen

Background: Genomic surveillance is extensively used for tracking public health outbreaks and healthcare-associated pathogens. Despite advancements in bioinformatics pipelines, there are still significant challenges in terms of infrastructure, expertise, and security when it comes to continuous surveillance. The existing pipelines often require the user to set up and manage their own infrastructure and are not designed for continuous surveillance that demands integration of new and regularly generated sequencing data with previous analyses. Additionally, academic projects often do not meet the privacy requirements of healthcare providers.

Results: We present Solu, a cloud-based platform that integrates genomic data into a real-time, privacy-focused surveillance system.

Evaluation: Solu's accuracy for taxonomy assignment, antimicrobial resistance genes, and phylogenetics was comparable to established pathogen surveillance pipelines. In some cases, Solu identified antimicrobial resistance genes that were previously undetected. Together, these findings demonstrate the efficacy of our platform.

Conclusions: By enabling reliable, user-friendly, and privacy-focused genomic surveillance, Solu has the potential to bridge the gap between cutting-edge research and practical, widespread application in healthcare settings. The platform is available for free academic use at https://platform.solugenomics.com .

背景:基因组监测被广泛用于跟踪公共卫生暴发和卫生保健相关病原体。尽管生物信息学管道取得了进步,但在基础设施、专业知识和安全方面,当涉及到持续监测时,仍然存在重大挑战。现有的管道通常需要用户建立和管理自己的基础设施,并且不适合持续监测,这需要将新的和定期生成的测序数据与以前的分析相结合。此外,学术项目往往不符合医疗保健提供者的隐私要求。结果:我们提出了Solu,一个基于云的平台,将基因组数据集成到一个实时的、以隐私为中心的监控系统中。评价:Solu在分类分配、抗菌素耐药基因和系统发育方面的准确性与已建立的病原体监测管道相当。在某些情况下,Solu发现了以前未发现的抗微生物药物耐药性基因。总之,这些发现证明了我们平台的有效性。结论:通过实现可靠、用户友好和注重隐私的基因组监测,Solu有可能弥合前沿研究与医疗保健环境中实际、广泛应用之间的差距。该平台可在https://platform.solugenomics.com上免费用于学术用途。
{"title":"Solu: a cloud platform for real-time genomic pathogen surveillance.","authors":"Timo Saratto, Kerkko Visuri, Jonatan Lehtinen, Irene Ortega-Sanz, Jacob L Steenwyk, Samuel Sihvonen","doi":"10.1186/s12859-024-06005-z","DOIUrl":"10.1186/s12859-024-06005-z","url":null,"abstract":"<p><strong>Background: </strong>Genomic surveillance is extensively used for tracking public health outbreaks and healthcare-associated pathogens. Despite advancements in bioinformatics pipelines, there are still significant challenges in terms of infrastructure, expertise, and security when it comes to continuous surveillance. The existing pipelines often require the user to set up and manage their own infrastructure and are not designed for continuous surveillance that demands integration of new and regularly generated sequencing data with previous analyses. Additionally, academic projects often do not meet the privacy requirements of healthcare providers.</p><p><strong>Results: </strong>We present Solu, a cloud-based platform that integrates genomic data into a real-time, privacy-focused surveillance system.</p><p><strong>Evaluation: </strong>Solu's accuracy for taxonomy assignment, antimicrobial resistance genes, and phylogenetics was comparable to established pathogen surveillance pipelines. In some cases, Solu identified antimicrobial resistance genes that were previously undetected. Together, these findings demonstrate the efficacy of our platform.</p><p><strong>Conclusions: </strong>By enabling reliable, user-friendly, and privacy-focused genomic surveillance, Solu has the potential to bridge the gap between cutting-edge research and practical, widespread application in healthcare settings. The platform is available for free academic use at https://platform.solugenomics.com .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"12"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MDFGNN-SMMA: prediction of potential small molecule-miRNA associations based on multi-source data fusion and graph neural networks. MDFGNN-SMMA:基于多源数据融合和图神经网络的潜在小分子- mirna关联预测。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-13 DOI: 10.1186/s12859-025-06040-4
Jianwei Li, Xukun Zhang, Bing Li, Ziyu Li, Zhenzhen Chen

Background: MicroRNAs (miRNAs) are pivotal in the initiation and progression of complex human diseases and have been identified as targets for small molecule (SM) drugs. However, the expensive and time-intensive characteristics of conventional experimental techniques for identifying SM-miRNA associations highlight the necessity for efficient computational methodologies in this field.

Results: In this study, we proposed a deep learning method called Multi-source Data Fusion and Graph Neural Networks for Small Molecule-MiRNA Association (MDFGNN-SMMA) to predict potential SM-miRNA associations. Firstly, MDFGNN-SMMA extracted features of Atom Pairs fingerprints and Molecular ACCess System fingerprints to derive fusion feature vectors for small molecules (SMs). The K-mer features were employed to generate the initial feature vectors for miRNAs. Secondly, cosine similarity measures were computed to construct the adjacency matrices for SMs and miRNAs, respectively. Thirdly, these feature vectors and adjacency matrices were input into a model comprising GAT and GraphSAGE, which were utilized to generate the final feature vectors for SMs and miRNAs. Finally, the averaged final feature vectors were utilized as input for a multilayer perceptron to predict the associations between SMs and miRNAs.

Conclusions: The performance of MDFGNN-SMMA was assessed using 10-fold cross-validation, demonstrating superior compared to the four state-of-the-art models in terms of both AUC and AUPR. Moreover, the experimental results of an independent test set confirmed the model's generalization capability. Additionally, the efficacy of MDFGNN-SMMA was substantiated through three case studies. The findings indicated that among the top 50 predicted miRNAs associated with Cisplatin, 5-Fluorouracil, and Doxorubicin, 42, 36, and 36 miRNAs, respectively, were corroborated by existing literature and the RNAInter database.

背景:MicroRNAs (miRNAs)在复杂人类疾病的发生和发展中起着关键作用,并已被确定为小分子(SM)药物的靶点。然而,用于鉴定SM-miRNA关联的传统实验技术昂贵且耗时的特点突出了在该领域高效计算方法的必要性。结果:在这项研究中,我们提出了一种深度学习方法,称为多源数据融合和小分子- mirna关联图神经网络(MDFGNN-SMMA)来预测潜在的SM-miRNA关联。首先,MDFGNN-SMMA提取原子对指纹和分子访问系统指纹的特征,得到小分子指纹的融合特征向量;利用K-mer特征生成mirna的初始特征向量。其次,计算余弦相似度,分别构建SMs和mirna的邻接矩阵;然后,将这些特征向量和邻接矩阵输入到GAT和GraphSAGE模型中,利用GAT和GraphSAGE模型生成SMs和mirna的最终特征向量。最后,将平均的最终特征向量用作多层感知器的输入,以预测SMs和mirna之间的关联。结论:MDFGNN-SMMA的性能通过10倍交叉验证进行评估,在AUC和AUPR方面都优于四种最先进的模型。独立测试集的实验结果证实了模型的泛化能力。此外,通过三个案例研究证实了MDFGNN-SMMA的疗效。结果显示,与顺铂、5-氟尿嘧啶和阿霉素相关的前50个预测mirna中,分别有42个、36个和36个mirna得到了现有文献和rnai数据库的证实。
{"title":"MDFGNN-SMMA: prediction of potential small molecule-miRNA associations based on multi-source data fusion and graph neural networks.","authors":"Jianwei Li, Xukun Zhang, Bing Li, Ziyu Li, Zhenzhen Chen","doi":"10.1186/s12859-025-06040-4","DOIUrl":"10.1186/s12859-025-06040-4","url":null,"abstract":"<p><strong>Background: </strong>MicroRNAs (miRNAs) are pivotal in the initiation and progression of complex human diseases and have been identified as targets for small molecule (SM) drugs. However, the expensive and time-intensive characteristics of conventional experimental techniques for identifying SM-miRNA associations highlight the necessity for efficient computational methodologies in this field.</p><p><strong>Results: </strong>In this study, we proposed a deep learning method called Multi-source Data Fusion and Graph Neural Networks for Small Molecule-MiRNA Association (MDFGNN-SMMA) to predict potential SM-miRNA associations. Firstly, MDFGNN-SMMA extracted features of Atom Pairs fingerprints and Molecular ACCess System fingerprints to derive fusion feature vectors for small molecules (SMs). The K-mer features were employed to generate the initial feature vectors for miRNAs. Secondly, cosine similarity measures were computed to construct the adjacency matrices for SMs and miRNAs, respectively. Thirdly, these feature vectors and adjacency matrices were input into a model comprising GAT and GraphSAGE, which were utilized to generate the final feature vectors for SMs and miRNAs. Finally, the averaged final feature vectors were utilized as input for a multilayer perceptron to predict the associations between SMs and miRNAs.</p><p><strong>Conclusions: </strong>The performance of MDFGNN-SMMA was assessed using 10-fold cross-validation, demonstrating superior compared to the four state-of-the-art models in terms of both AUC and AUPR. Moreover, the experimental results of an independent test set confirmed the model's generalization capability. Additionally, the efficacy of MDFGNN-SMMA was substantiated through three case studies. The findings indicated that among the top 50 predicted miRNAs associated with Cisplatin, 5-Fluorouracil, and Doxorubicin, 42, 36, and 36 miRNAs, respectively, were corroborated by existing literature and the RNAInter database.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"13"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730471/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Not seeing the trees for the forest. The impact of neighbours on graph-based configurations in histopathology. 只见树木不见森林。组织病理学中邻域对基于图的构型的影响。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-11 DOI: 10.1186/s12859-024-06007-x
Olga Fourkioti, Matt De Vries, Reed Naidoo, Chris Bakal

Background: Deep learning (DL) has set new standards in cancer diagnosis, significantly enhancing the accuracy of automated classification of whole slide images (WSIs) derived from biopsied tissue samples. To enable DL models to process these large images, WSIs are typically divided into thousands of smaller tiles, each containing 10-50 cells. Multiple Instance Learning (MIL) is a commonly used approach, where WSIs are treated as bags comprising numerous tiles (instances) and only bag-level labels are provided during training. The model learns from these broad labels to extract more detailed, instance-level insights. However, biopsied sections often exhibit high intra- and inter-phenotypic heterogeneity, presenting a significant challenge for classification. To address this, many graph-based methods have been proposed, where each WSI is represented as a graph with tiles as nodes and edges defined by specific spatial relationships.

Results: In this study, we investigate how different graph configurations, varying in connectivity and neighborhood structure, affect the performance of MIL models. We developed a novel pipeline, K-MIL, to evaluate the impact of contextual information on cell classification performance. By incorporating neighboring tiles into the analysis, we examined whether contextual information improves or impairs the network's ability to identify patterns and features critical for accurate classification. Our experiments were conducted on two datasets: COLON cancer and UCSB datasets.

Conclusions: Our results indicate that while incorporating more spatial context information generally improves model accuracy at both the bag and tile levels, the improvement at the tile level is not linear. In some instances, increasing spatial context leads to misclassification, suggesting that more context is not always beneficial. This finding highlights the need for careful consideration when incorporating spatial context information in digital pathology classification tasks.

背景:深度学习(DL)为癌症诊断设定了新的标准,显著提高了来自活检组织样本的全切片图像(wsi)自动分类的准确性。为了使深度学习模型能够处理这些大型图像,通常将wsi划分为数千个较小的块,每个块包含10-50个单元格。多实例学习(Multiple Instance Learning, MIL)是一种常用的方法,其中wsi被视为包含许多块(实例)的包,并且在训练期间只提供包级标签。模型从这些广泛的标签中学习,以提取更详细的实例级洞察。然而,活检切片通常表现出高度的表型内和表型间异质性,这对分类提出了重大挑战。为了解决这个问题,已经提出了许多基于图的方法,其中每个WSI都表示为一个图,其中瓦片作为节点和由特定空间关系定义的边。结果:在本研究中,我们研究了不同的图配置,不同的连通性和邻域结构,如何影响MIL模型的性能。我们开发了一种新的管道,K-MIL,来评估上下文信息对细胞分类性能的影响。通过将相邻的块合并到分析中,我们检查了上下文信息是提高还是削弱了网络识别模式和特征的能力,这些模式和特征对准确分类至关重要。我们的实验在两个数据集上进行:结肠癌和UCSB数据集。结论:我们的研究结果表明,虽然纳入更多的空间上下文信息通常会提高袋子和瓷砖层面的模型精度,但瓷砖层面的提高不是线性的。在某些情况下,增加空间背景会导致错误分类,这表明更多的背景并不总是有益的。这一发现强调了在数字病理分类任务中纳入空间上下文信息时需要仔细考虑。
{"title":"Not seeing the trees for the forest. The impact of neighbours on graph-based configurations in histopathology.","authors":"Olga Fourkioti, Matt De Vries, Reed Naidoo, Chris Bakal","doi":"10.1186/s12859-024-06007-x","DOIUrl":"10.1186/s12859-024-06007-x","url":null,"abstract":"<p><strong>Background: </strong>Deep learning (DL) has set new standards in cancer diagnosis, significantly enhancing the accuracy of automated classification of whole slide images (WSIs) derived from biopsied tissue samples. To enable DL models to process these large images, WSIs are typically divided into thousands of smaller tiles, each containing 10-50 cells. Multiple Instance Learning (MIL) is a commonly used approach, where WSIs are treated as bags comprising numerous tiles (instances) and only bag-level labels are provided during training. The model learns from these broad labels to extract more detailed, instance-level insights. However, biopsied sections often exhibit high intra- and inter-phenotypic heterogeneity, presenting a significant challenge for classification. To address this, many graph-based methods have been proposed, where each WSI is represented as a graph with tiles as nodes and edges defined by specific spatial relationships.</p><p><strong>Results: </strong>In this study, we investigate how different graph configurations, varying in connectivity and neighborhood structure, affect the performance of MIL models. We developed a novel pipeline, K-MIL, to evaluate the impact of contextual information on cell classification performance. By incorporating neighboring tiles into the analysis, we examined whether contextual information improves or impairs the network's ability to identify patterns and features critical for accurate classification. Our experiments were conducted on two datasets: COLON cancer and UCSB datasets.</p><p><strong>Conclusions: </strong>Our results indicate that while incorporating more spatial context information generally improves model accuracy at both the bag and tile levels, the improvement at the tile level is not linear. In some instances, increasing spatial context leads to misclassification, suggesting that more context is not always beneficial. This finding highlights the need for careful consideration when incorporating spatial context information in digital pathology classification tasks.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"9"},"PeriodicalIF":2.9,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724494/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142963688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1