首页 > 最新文献

PLoS Computational Biology最新文献

英文 中文
Zimin patterns in genomes. 基因组中的子敏模式。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-09 DOI: 10.1371/journal.pcbi.1013909
Nikol Chantzi, Ioannis Mouratidis, Ilias Georgakopoulos-Soares

Zimin words are words that have the same prefix and suffix. They are unavoidable patterns, with all sufficiently large strings encompassing them. Here, we examine for the first time the presence of k-mers not containing any Zimin patterns, defined hereafter as Zimin avoidmers, in the human genome. We report that in the reference human genome all k-mers above 104 base-pairs contain Zimin words. We find that Zimin avoidmers are most enriched in coding and Human Satellite 1 regions in the human genome. Zimin avoidmers display a depletion of germline insertions and deletions relative to surrounding genomic areas. We also apply our methodology in the genomes of another eight model organisms from all three domains of life, finding large differences in their Zimin avoidmer frequencies and their genomic localization preferences. We observe that Zimin avoidmers exhibit the highest genomic density in prokaryotic organisms, with E. coli showing particularly high levels, while the lowest density is found in eukaryotic organisms, with D. rerio having the lowest. Among the studied genomes the longest k-mer length at which Zimin avoidmers are observed is that of S. cerevisiae at k-mer length of 115 base-pairs. We conclude that Zimin avoidmers display inhomogeneous distributions in organismal genomes, have intricate properties including lower insertion and deletion rates, and disappear faster than the theoretical expected k-mer length, across the organismal genomes studied.

子民词是具有相同前缀和后缀的词。它们是不可避免的模式,所有足够大的字符串都包含它们。在这里,我们首次研究了人类基因组中不含任何Zimin模式的k-mers的存在,下文将其定义为Zimin回避者。我们报道,在参考人类基因组中,所有超过104个碱基对的k-mers都包含Zimin词。我们发现,在人类基因组中,子敏回避者在编码区和人类卫星1区最为丰富。子敏回避者表现出相对于周围基因组区域的生殖系插入和缺失的耗竭。我们还将我们的方法应用于来自所有三个生命领域的另外八种模式生物的基因组中,发现它们的子敏回避频率和基因组定位偏好存在很大差异。我们观察到,在原核生物中,Zimin回避菌的基因组密度最高,大肠杆菌的密度特别高,而在真核生物中,Zimin回避菌的密度最低,D. rerio的密度最低。在研究的基因组中,最长的k-mer长度为115个碱基对的酿酒葡萄球菌。我们得出结论,在研究的生物体基因组中,Zimin避免子在生物体基因组中表现出不均匀分布,具有复杂的特性,包括较低的插入和删除率,并且比理论预期的k-mer长度消失得更快。
{"title":"Zimin patterns in genomes.","authors":"Nikol Chantzi, Ioannis Mouratidis, Ilias Georgakopoulos-Soares","doi":"10.1371/journal.pcbi.1013909","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013909","url":null,"abstract":"<p><p>Zimin words are words that have the same prefix and suffix. They are unavoidable patterns, with all sufficiently large strings encompassing them. Here, we examine for the first time the presence of k-mers not containing any Zimin patterns, defined hereafter as Zimin avoidmers, in the human genome. We report that in the reference human genome all k-mers above 104 base-pairs contain Zimin words. We find that Zimin avoidmers are most enriched in coding and Human Satellite 1 regions in the human genome. Zimin avoidmers display a depletion of germline insertions and deletions relative to surrounding genomic areas. We also apply our methodology in the genomes of another eight model organisms from all three domains of life, finding large differences in their Zimin avoidmer frequencies and their genomic localization preferences. We observe that Zimin avoidmers exhibit the highest genomic density in prokaryotic organisms, with E. coli showing particularly high levels, while the lowest density is found in eukaryotic organisms, with D. rerio having the lowest. Among the studied genomes the longest k-mer length at which Zimin avoidmers are observed is that of S. cerevisiae at k-mer length of 115 base-pairs. We conclude that Zimin avoidmers display inhomogeneous distributions in organismal genomes, have intricate properties including lower insertion and deletion rates, and disappear faster than the theoretical expected k-mer length, across the organismal genomes studied.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013909"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trainable subnetworks reveal insights into structure knowledge organization in protein language models. 可训练子网络揭示了蛋白质语言模型中结构知识组织的见解。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-09 DOI: 10.1371/journal.pcbi.1013925
Ria Vinod, Ava P Amini, Lorin Crawford, Kevin K Yang

Protein language models (PLMs) pretrained via a masked language modeling objective have proven effective across a range of structure-related tasks, including high-resolution structure prediction. However, it remains unclear to what extent these models factorize protein structural categories among their learned parameters. In this work, we introduce trainable subnetworks, which mask out the PLM weights responsible for language modeling performance on a structural category of proteins. We systematically trained 39 PLM subnetworks targeting both sequence- and residue-level features at varying degrees of resolution using annotations defined by the CATH taxonomy and secondary structure elements. Using these PLM subnetworks, we assessed how structural factorization in PLMs influences downstream structure prediction. Our results show that PLMs are highly sensitive to sequence-level features and can predominantly disentangle extremely coarse or fine-grained information. Furthermore, we observe that structure prediction is highly responsive to factorized PLM representations and that small changes in language modeling performance can significantly impair PLM-based structure prediction capabilities. Our work presents a framework for studying feature entanglement within pretrained PLMs and can be leveraged to improve the alignment of learned PLM representations with known biological concepts.

通过屏蔽语言建模目标预训练的蛋白质语言模型(PLMs)已被证明在一系列与结构相关的任务中是有效的,包括高分辨率结构预测。然而,目前尚不清楚这些模型在多大程度上在其学习参数中分解蛋白质结构类别。在这项工作中,我们引入了可训练的子网,它掩盖了在蛋白质结构类别上负责语言建模性能的PLM权重。我们使用CATH分类法和二级结构元素定义的注释,系统地训练了39个针对序列级和残差级特征的PLM子网络,其分辨率不同。利用这些PLM子网,我们评估了PLM中的结构分解如何影响下游结构预测。我们的研究结果表明,plm对序列级特征高度敏感,并且可以主要解开极其粗糙或细粒度的信息。此外,我们观察到结构预测对因子化PLM表示具有高度响应性,并且语言建模性能的微小变化会显著损害基于PLM的结构预测能力。我们的工作提出了一个框架,用于研究预训练PLM中的特征纠缠,并可用于改进学习的PLM表示与已知生物概念的一致性。
{"title":"Trainable subnetworks reveal insights into structure knowledge organization in protein language models.","authors":"Ria Vinod, Ava P Amini, Lorin Crawford, Kevin K Yang","doi":"10.1371/journal.pcbi.1013925","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013925","url":null,"abstract":"<p><p>Protein language models (PLMs) pretrained via a masked language modeling objective have proven effective across a range of structure-related tasks, including high-resolution structure prediction. However, it remains unclear to what extent these models factorize protein structural categories among their learned parameters. In this work, we introduce trainable subnetworks, which mask out the PLM weights responsible for language modeling performance on a structural category of proteins. We systematically trained 39 PLM subnetworks targeting both sequence- and residue-level features at varying degrees of resolution using annotations defined by the CATH taxonomy and secondary structure elements. Using these PLM subnetworks, we assessed how structural factorization in PLMs influences downstream structure prediction. Our results show that PLMs are highly sensitive to sequence-level features and can predominantly disentangle extremely coarse or fine-grained information. Furthermore, we observe that structure prediction is highly responsive to factorized PLM representations and that small changes in language modeling performance can significantly impair PLM-based structure prediction capabilities. Our work presents a framework for studying feature entanglement within pretrained PLMs and can be leveraged to improve the alignment of learned PLM representations with known biological concepts.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013925"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introducing gold-standard essential gene datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses. 引入铜绿假单胞菌的金标准基本基因数据集,以增强n- seq分析。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-09 DOI: 10.1371/journal.pcbi.1013945
Cléophée Van Maele, Ségolène Caboche, Nathan Nicolau-Guillaumet, Anaëlle Muggeo, Thomas Guillard

Transposon Sequencing (Tn-Seq) is a high-throughput technique that utilizes transposon mutant libraries to assess gene fitness or essentiality under specific conditions potentially identifying novel therapeutic targets. However, the diversity of statistical methods, bioinformatics tools, and parameters complicates the selection of the most appropriate and reliable analysis pipeline for a given dataset. A significant limitation of existing studies is the absence of a gold-standard set of essential genes (EGs) for evaluating the analysis process. Relying on the original study as a gold-standard is suboptimal, as these results may have been obtained using non-optimal tools. Here, we introduce reliable EG datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses. By utilizing literature data and sequencing of six samples from PA14 Wild-Type (WT) and PA14 OprD-deficient (ΔoprD), grown in LB medium, we compared EG lists generated by several statistical methods of TRANSIT2 and by the FiTnEss tools. We established a reference dataset of 84 genes found in P. aeruginosa and another gold-standard set composed of 115 genes specific to PA14 grown in LB. Our findings revealed that depending on the analysis method used, retrieval rates of gold-standard genes ranged from 0% to 100%. The Hidden-Markov Model (HMM) method available in TRANSIT2 identified approximately 90% of gold-standard EGs, while FiTnEss identified up to 100%. This study addressed a critical gap in the field by providing gold-standard sets of EGs, enabling comparative evaluation of Tn-Seq analysis methods to help researcher select the most suitable bioinformatics pipeline for a given Tn-Seq dataset. We anticipate that our results will facilitate Tn-Seq analysis comparisons, harmonize P. aeruginosa-related studies, promote standardization and enhance reproducibility. Ultimately, this will lead to more reliable identification of EGs and potential therapeutic targets in P. aeruginosa, advancing our understanding of this important pathogen.

转座子测序(Tn-Seq)是一种高通量技术,利用转座子突变文库来评估基因在特定条件下的适应性或必要性,从而潜在地确定新的治疗靶点。然而,统计方法、生物信息学工具和参数的多样性使得为给定数据集选择最合适和最可靠的分析管道变得复杂。现有研究的一个重大限制是缺乏一套黄金标准的必要基因(EGs)来评估分析过程。依靠原始研究作为金标准是次优的,因为这些结果可能是使用非最佳工具获得的。在这里,我们引入可靠的EG数据集铜绿假单胞菌来增强n- seq分析。利用文献数据,并对LB培养基中生长的6个PA14野生型(WT)和PA14 oprd缺陷型(ΔoprD)样本进行测序,比较了TRANSIT2和FiTnEss工具几种统计方法生成的EG列表。我们建立了铜绿假单胞菌中84个基因的参考数据集和LB中115个PA14特异性基因的金标准集。我们的研究结果表明,根据使用的分析方法,金标准基因的检索率从0%到100%不等。TRANSIT2中使用的隐马尔可夫模型(HMM)方法识别了大约90%的金标准EGs,而FiTnEss识别率高达100%。本研究通过提供金标准EGs集解决了该领域的一个关键空白,使Tn-Seq分析方法的比较评估能够帮助研究人员为给定的n- seq数据集选择最合适的生物信息学管道。我们期望我们的结果能够促进n- seq分析比较,协调铜绿假单胞菌相关研究,促进标准化和提高可重复性。最终,这将导致更可靠的EGs鉴定和潜在的治疗靶点铜绿假单胞菌,推进我们对这一重要病原体的了解。
{"title":"Introducing gold-standard essential gene datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses.","authors":"Cléophée Van Maele, Ségolène Caboche, Nathan Nicolau-Guillaumet, Anaëlle Muggeo, Thomas Guillard","doi":"10.1371/journal.pcbi.1013945","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013945","url":null,"abstract":"<p><p>Transposon Sequencing (Tn-Seq) is a high-throughput technique that utilizes transposon mutant libraries to assess gene fitness or essentiality under specific conditions potentially identifying novel therapeutic targets. However, the diversity of statistical methods, bioinformatics tools, and parameters complicates the selection of the most appropriate and reliable analysis pipeline for a given dataset. A significant limitation of existing studies is the absence of a gold-standard set of essential genes (EGs) for evaluating the analysis process. Relying on the original study as a gold-standard is suboptimal, as these results may have been obtained using non-optimal tools. Here, we introduce reliable EG datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses. By utilizing literature data and sequencing of six samples from PA14 Wild-Type (WT) and PA14 OprD-deficient (ΔoprD), grown in LB medium, we compared EG lists generated by several statistical methods of TRANSIT2 and by the FiTnEss tools. We established a reference dataset of 84 genes found in P. aeruginosa and another gold-standard set composed of 115 genes specific to PA14 grown in LB. Our findings revealed that depending on the analysis method used, retrieval rates of gold-standard genes ranged from 0% to 100%. The Hidden-Markov Model (HMM) method available in TRANSIT2 identified approximately 90% of gold-standard EGs, while FiTnEss identified up to 100%. This study addressed a critical gap in the field by providing gold-standard sets of EGs, enabling comparative evaluation of Tn-Seq analysis methods to help researcher select the most suitable bioinformatics pipeline for a given Tn-Seq dataset. We anticipate that our results will facilitate Tn-Seq analysis comparisons, harmonize P. aeruginosa-related studies, promote standardization and enhance reproducibility. Ultimately, this will lead to more reliable identification of EGs and potential therapeutic targets in P. aeruginosa, advancing our understanding of this important pathogen.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013945"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell atlases and the developmental foundations of the phenotype. 细胞图谱和表型的发育基础。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-09 DOI: 10.1371/journal.pcbi.1013944
Alicia Lou, Mónica Chagoyen, Juan F Poyatos

It is widely acknowledged that development shapes phenotypes, yet the extent to which genes with similar expression patterns during development lead to equivalent organismal phenotypes when mutated remains unclear. Here, we propose addressing this issue, which we term the [Formula: see text]evelopment-to-[Formula: see text]henotype, or [Formula: see text]-[Formula: see text], rule, by leveraging single-cell gene expression atlases and phenotypic ontologies, using Caenorhabditis elegans as a model system. This framework quantifies the proportionality between developmental expression and phenotypic similarities, demonstrating that the relationship holds on average. Genes that strongly fulfill the rule exhibit broad "housekeeping" expression and are associated with systemic phenotypes, whereas weak similarities correspond to specific expression patterns and specialized phenotypes. Deviations from the [Formula: see text]-[Formula: see text] rule provide insights into developmental divergence and phenotypic degeneracy, highlighting genes with narrow functional roles but systemic phenotypic impact. Furthermore, genes that closely adhere to the rule exhibit the highest pleiotropic impact on organismal traits. Our analysis also identifies cell types, such as ASK neurons, as key mediators of phenotype-specific gene contributions, exemplified by their association with chemosensory behavior and chemotaxis. These findings validate the [Formula: see text]-[Formula: see text] rule and underscore the role of cells as critical mediators of the genotype-phenotype map, offering a unified framework to understand the developmental origins of phenotypic complexity.

人们普遍认为,发育塑造了表型,然而,在发育过程中具有相似表达模式的基因在突变时导致等效的生物体表型的程度仍不清楚。在这里,我们建议解决这个问题,我们称之为[公式:见文]发展到-[公式:见文]表型,或[公式:见文]-[公式:见文],规则,利用单细胞基因表达图谱和表型本体论,以秀丽隐杆线虫为模型系统。这个框架量化了发育表达和表型相似性之间的比例关系,表明这种关系在平均水平上是成立的。强烈满足这一规则的基因表现出广泛的“内务管理”表达,并与系统表型相关,而弱相似性对应于特定的表达模式和专门的表型。偏离[公式:见文本]-[公式:见文本]规则提供了对发育分化和表型退化的见解,突出了具有狭窄功能作用但具有系统表型影响的基因。此外,密切遵守这一规则的基因对生物体性状的多效性影响最大。我们的分析还确定了细胞类型,如ASK神经元,作为表型特异性基因贡献的关键介质,例如它们与化学感觉行为和趋化性的关联。这些发现验证了[公式:见文本]-[公式:见文本]规则,并强调了细胞作为基因型-表型图谱的关键介质的作用,为理解表型复杂性的发育起源提供了一个统一的框架。
{"title":"Cell atlases and the developmental foundations of the phenotype.","authors":"Alicia Lou, Mónica Chagoyen, Juan F Poyatos","doi":"10.1371/journal.pcbi.1013944","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013944","url":null,"abstract":"<p><p>It is widely acknowledged that development shapes phenotypes, yet the extent to which genes with similar expression patterns during development lead to equivalent organismal phenotypes when mutated remains unclear. Here, we propose addressing this issue, which we term the [Formula: see text]evelopment-to-[Formula: see text]henotype, or [Formula: see text]-[Formula: see text], rule, by leveraging single-cell gene expression atlases and phenotypic ontologies, using Caenorhabditis elegans as a model system. This framework quantifies the proportionality between developmental expression and phenotypic similarities, demonstrating that the relationship holds on average. Genes that strongly fulfill the rule exhibit broad \"housekeeping\" expression and are associated with systemic phenotypes, whereas weak similarities correspond to specific expression patterns and specialized phenotypes. Deviations from the [Formula: see text]-[Formula: see text] rule provide insights into developmental divergence and phenotypic degeneracy, highlighting genes with narrow functional roles but systemic phenotypic impact. Furthermore, genes that closely adhere to the rule exhibit the highest pleiotropic impact on organismal traits. Our analysis also identifies cell types, such as ASK neurons, as key mediators of phenotype-specific gene contributions, exemplified by their association with chemosensory behavior and chemotaxis. These findings validate the [Formula: see text]-[Formula: see text] rule and underscore the role of cells as critical mediators of the genotype-phenotype map, offering a unified framework to understand the developmental origins of phenotypic complexity.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013944"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcriptomic-guided whole-slide image classification for molecular subtype identification. 转录组学引导下的分子亚型识别全片图像分类。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-09 DOI: 10.1371/journal.pcbi.1013950
Weiwen Wang, Xiwen Zhang, Yuanyan Xiong

Recent advancements in computational pathology have greatly improved automated histopathological analysis. A compelling question in the field is how morphological traits are associated with genetic characteristics or molecular phenotypes. Here we propose TEMI, a novel framework for molecular subtype classification of cancers using whole-slide images (WSIs), augmented with transcriptomic data during training. TEMI aims to extract molecular-level signals from WSIs and make efficient use of available multimodal data. To this end, TEMI introduces a patch fusion network that captures dependencies among local patches of gigapixel WSIs to produce global representations and aligns them with transcriptomic embeddings attained from a masked transcriptomic autoencoder. TEMI achieves superior performance compared with existing methods in molecular subtype classification, owing to its effective integration of transcriptomic information achieved by the two developed alignment strategies. Guided by discriminative transcriptomic data, TEMI learns invariant WSI representations, while morphological features also enhance gene expression prediction. These findings suggest that histological features encode latent molecular signals, highlighting the interplay between the tumor microenvironment and cancer transcriptomics. Our study demonstrates how multimodal learning can bridge morphology and molecular biology, providing an effective tool to advance precision medicine.

计算病理学的最新进展极大地改善了自动组织病理学分析。该领域的一个引人注目的问题是形态特征如何与遗传特征或分子表型相关联。在这里,我们提出了TEMI,这是一个使用全幻灯片图像(wsi)进行癌症分子亚型分类的新框架,在训练期间增加了转录组学数据。TEMI旨在从wsi中提取分子水平的信号,并有效利用现有的多模态数据。为此,TEMI引入了一个补丁融合网络,该网络捕获千兆像素wsi的局部补丁之间的依赖关系,以产生全局表示,并将它们与从掩膜转录组自编码器获得的转录组嵌入对齐。与现有的分子亚型分类方法相比,TEMI能够有效地整合两种开发的比对策略所获得的转录组信息,从而在分子亚型分类方面取得了更好的性能。在鉴别转录组数据的指导下,TEMI学习不变的WSI表示,而形态特征也增强了基因表达预测。这些发现表明,组织学特征编码潜在的分子信号,突出了肿瘤微环境与癌症转录组学之间的相互作用。我们的研究展示了多模态学习如何在形态学和分子生物学之间架起桥梁,为推进精准医学提供了有效的工具。
{"title":"Transcriptomic-guided whole-slide image classification for molecular subtype identification.","authors":"Weiwen Wang, Xiwen Zhang, Yuanyan Xiong","doi":"10.1371/journal.pcbi.1013950","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013950","url":null,"abstract":"<p><p>Recent advancements in computational pathology have greatly improved automated histopathological analysis. A compelling question in the field is how morphological traits are associated with genetic characteristics or molecular phenotypes. Here we propose TEMI, a novel framework for molecular subtype classification of cancers using whole-slide images (WSIs), augmented with transcriptomic data during training. TEMI aims to extract molecular-level signals from WSIs and make efficient use of available multimodal data. To this end, TEMI introduces a patch fusion network that captures dependencies among local patches of gigapixel WSIs to produce global representations and aligns them with transcriptomic embeddings attained from a masked transcriptomic autoencoder. TEMI achieves superior performance compared with existing methods in molecular subtype classification, owing to its effective integration of transcriptomic information achieved by the two developed alignment strategies. Guided by discriminative transcriptomic data, TEMI learns invariant WSI representations, while morphological features also enhance gene expression prediction. These findings suggest that histological features encode latent molecular signals, highlighting the interplay between the tumor microenvironment and cancer transcriptomics. Our study demonstrates how multimodal learning can bridge morphology and molecular biology, providing an effective tool to advance precision medicine.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013950"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FKSUDDAPre: A drug-disease association prediction framework based on F-TEST feature selection and AMDKSU resampling with interpretability analysis. FKSUDDAPre:基于F-TEST特征选择和AMDKSU重采样和可解释性分析的药物-疾病关联预测框架。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-05 DOI: 10.1371/journal.pcbi.1013947
Yun Zuo, Chenyi Zhang, Ge Hua, Qiao Ning, Xiangrong Liu, Xiangxiang Zeng, Zhaohong Deng
<p><p>In drug discovery and therapeutic research, the prediction of drug-disease associations (DDAs) holds significant scientific and clinical value. Drug molecules exert their effects by precisely identifying disease-related biological targets, systematically modulating the entire pharmacological process from absorption, distribution, and metabolism to final efficacy. Accurate prediction of drug-disease associations not only facilitates an in-depth understanding of molecular mechanisms of drug action but also provides critical theoretical foundations for drug repositioning and personalized medicine. While traditional prediction methods based on in vitro experiments and clinical statistics yield reliable results, they suffer from inherent drawbacks such as long development cycles, substantial resource consumption, and low throughput. In contrast, emerging machine learning techniques offer a promising solution to these bottlenecks, enabling the intelligent and efficient discovery of potential drug-disease association networks and significantly improving drug development efficiency. However, it is noteworthy that existing machine learning methods still face significant challenges in practical applications: the complexity of feature construction raises the threshold for data processing; data sparsity constrains the depth of information mining; and the pervasive issue of sample imbalance poses a severe challenge to the model's predictive accuracy and generalization performance. In this study, we developed an efficient and accurate framework for drug-disease association prediction named FKSUDDAPre. The model employs a multi-modal feature fusion strategy: on one hand, it leverages an ensemble of Mol2vec and K- BERT to deeply capture the semantic features of drug molecular fingerprints; on the other hand, it integrates Medical Subject Headings (MeSH) with DeepWalk to effectively reduce the dimensionality of disease features while preserving their relational structure. To address the class imbalance problem, FKSUDDAPre designed an optimization algorithm called AMDKSU, which combined clustering with an improved distance metric strategy, significantly enhancing the discriminative power of the sample set. For data processing, F-test was employed for feature importance ranking, effectively reducing data dimensionality and improving model generalization. For the predictive architecture, FKSUDDAPre proposed a novel ensemble framework composed of XGBoost, Decision Tree, Random Forest, and HyperFast. By employing a dynamic weight allocation strategy, this ensemble effectively harnesses the complementary strengths of these models to achieve significantly enhanced predictive performance. Rigorous validation demonstrated the system's outstanding performance across multiple evaluation metrics, with an average AUC of 0.9725, improving the AUC by approximately 3.88% compared to the best-performing baseline model. In the prediction of Alzheimer's disease and Parkinson'
在药物发现和治疗研究中,药物-疾病关联预测(DDAs)具有重要的科学和临床价值。药物分子通过精确识别与疾病相关的生物靶点,系统调节从吸收、分布、代谢到最终药效的整个药理过程来发挥作用。准确预测药物-疾病关联不仅有助于深入了解药物作用的分子机制,而且为药物重新定位和个性化医疗提供重要的理论基础。传统的基于体外实验和临床统计的预测方法虽然结果可靠,但存在开发周期长、资源消耗大、通量低等固有缺陷。相比之下,新兴的机器学习技术为这些瓶颈提供了一个有希望的解决方案,能够智能有效地发现潜在的药物-疾病关联网络,并显着提高药物开发效率。然而,值得注意的是,现有的机器学习方法在实际应用中仍然面临着重大挑战:特征构建的复杂性提高了数据处理的门槛;数据稀疏性限制了信息挖掘的深度;而普遍存在的样本不平衡问题对模型的预测精度和泛化性能提出了严峻的挑战。在这项研究中,我们开发了一个高效准确的药物-疾病关联预测框架,名为FKSUDDAPre。该模型采用多模态特征融合策略:一方面利用Mol2vec和K- BERT的集合深度捕获药物分子指纹的语义特征;另一方面,将医学主题词(Medical Subject heading, MeSH)与DeepWalk相结合,在保持疾病特征关系结构的同时,有效降低疾病特征的维数。为了解决类不平衡问题,FKSUDDAPre设计了一种名为AMDKSU的优化算法,该算法将聚类与改进的距离度量策略相结合,显著增强了样本集的判别能力。在数据处理上,采用f检验进行特征重要度排序,有效降低了数据维数,提高了模型的泛化能力。对于预测体系结构,FKSUDDAPre提出了一种由XGBoost、Decision Tree、Random Forest和HyperFast组成的新型集成框架。通过采用动态权重分配策略,该集成有效地利用了这些模型的互补优势,从而显著提高了预测性能。严格的验证证明了该系统在多个评估指标上的出色性能,平均AUC为0.9725,与性能最好的基线模型相比,AUC提高了约3.88%。在对阿尔茨海默病和帕金森病的预测中,FKSUDDAPre推荐的前10名候选药物中分别有80%和60%得到了文献证实,表明该模型具有良好的实际应用潜力。此外,我们对模型的预测进行了基于lime的特征重要性分析,将特征与目标变量之间的相关性可视化,以证明模型的可解释性。使用PyQt5框架还开发了一个跨平台、用户友好的可视化工具。
{"title":"FKSUDDAPre: A drug-disease association prediction framework based on F-TEST feature selection and AMDKSU resampling with interpretability analysis.","authors":"Yun Zuo, Chenyi Zhang, Ge Hua, Qiao Ning, Xiangrong Liu, Xiangxiang Zeng, Zhaohong Deng","doi":"10.1371/journal.pcbi.1013947","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013947","url":null,"abstract":"&lt;p&gt;&lt;p&gt;In drug discovery and therapeutic research, the prediction of drug-disease associations (DDAs) holds significant scientific and clinical value. Drug molecules exert their effects by precisely identifying disease-related biological targets, systematically modulating the entire pharmacological process from absorption, distribution, and metabolism to final efficacy. Accurate prediction of drug-disease associations not only facilitates an in-depth understanding of molecular mechanisms of drug action but also provides critical theoretical foundations for drug repositioning and personalized medicine. While traditional prediction methods based on in vitro experiments and clinical statistics yield reliable results, they suffer from inherent drawbacks such as long development cycles, substantial resource consumption, and low throughput. In contrast, emerging machine learning techniques offer a promising solution to these bottlenecks, enabling the intelligent and efficient discovery of potential drug-disease association networks and significantly improving drug development efficiency. However, it is noteworthy that existing machine learning methods still face significant challenges in practical applications: the complexity of feature construction raises the threshold for data processing; data sparsity constrains the depth of information mining; and the pervasive issue of sample imbalance poses a severe challenge to the model's predictive accuracy and generalization performance. In this study, we developed an efficient and accurate framework for drug-disease association prediction named FKSUDDAPre. The model employs a multi-modal feature fusion strategy: on one hand, it leverages an ensemble of Mol2vec and K- BERT to deeply capture the semantic features of drug molecular fingerprints; on the other hand, it integrates Medical Subject Headings (MeSH) with DeepWalk to effectively reduce the dimensionality of disease features while preserving their relational structure. To address the class imbalance problem, FKSUDDAPre designed an optimization algorithm called AMDKSU, which combined clustering with an improved distance metric strategy, significantly enhancing the discriminative power of the sample set. For data processing, F-test was employed for feature importance ranking, effectively reducing data dimensionality and improving model generalization. For the predictive architecture, FKSUDDAPre proposed a novel ensemble framework composed of XGBoost, Decision Tree, Random Forest, and HyperFast. By employing a dynamic weight allocation strategy, this ensemble effectively harnesses the complementary strengths of these models to achieve significantly enhanced predictive performance. Rigorous validation demonstrated the system's outstanding performance across multiple evaluation metrics, with an average AUC of 0.9725, improving the AUC by approximately 3.88% compared to the best-performing baseline model. In the prediction of Alzheimer's disease and Parkinson'","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013947"},"PeriodicalIF":3.6,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling human visuomotor adaptation with a disturbance observer framework. 基于扰动观测器框架的人体视觉运动自适应建模。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-04 DOI: 10.1371/journal.pcbi.1013937
Gaurav Sharma, Bernard Marius 't Hart, Jean-Jacques Orban de Xivry, Denise Y P Henriques, Mireille E Broucke

A fundamental problem of visuomotor adaptation research is to understand how the brain is capable to asymptotically remove a predictable exogenous disturbance from a visual error signal using limited sensor information by re-calibration of hand movement. From a control theory perspective, the most striking aspect of this problem is that it falls squarely in the realm of the internal model principle of control theory. Despite this fact, the relationship between the internal model principle and models of visuomotor adaptation is currently not well developed. This paper aims to close this gap by proposing an abstract discrete-time state space model of visuomotor adaptation based on the internal model principle. The proposed DO Model, a metonym for its most important component, a disturbance observer, addresses key modeling requirements: modular architecture, physically relevant signals, parameters tied to atomic behaviors, and capacity for abstraction. The two main computational modules are a disturbance observer, a recently developed class of internal models, and a feedforward system that learns from the disturbance observer to improve feedforward motor commands.

视觉运动适应研究的一个基本问题是了解大脑如何能够利用有限的传感器信息,通过重新校准手部运动,逐步消除视觉误差信号中可预测的外源干扰。从控制理论的角度来看,这个问题最引人注目的方面是它完全属于控制理论的内部模型原理领域。尽管如此,内部模型原理与视觉运动适应模型之间的关系目前还没有得到很好的发展。本文提出了一种基于内模原理的视觉运动自适应的抽象离散时间状态空间模型来弥补这一空白。所提出的DO模型是其最重要的组成部分干扰观测器的代名词,它解决了关键的建模要求:模块化体系结构、物理相关信号、与原子行为相关的参数以及抽象能力。两个主要的计算模块是一个干扰观测器,一个最近开发的内部模型类,以及一个从干扰观测器学习以改进前馈电机命令的前馈系统。
{"title":"Modeling human visuomotor adaptation with a disturbance observer framework.","authors":"Gaurav Sharma, Bernard Marius 't Hart, Jean-Jacques Orban de Xivry, Denise Y P Henriques, Mireille E Broucke","doi":"10.1371/journal.pcbi.1013937","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013937","url":null,"abstract":"<p><p>A fundamental problem of visuomotor adaptation research is to understand how the brain is capable to asymptotically remove a predictable exogenous disturbance from a visual error signal using limited sensor information by re-calibration of hand movement. From a control theory perspective, the most striking aspect of this problem is that it falls squarely in the realm of the internal model principle of control theory. Despite this fact, the relationship between the internal model principle and models of visuomotor adaptation is currently not well developed. This paper aims to close this gap by proposing an abstract discrete-time state space model of visuomotor adaptation based on the internal model principle. The proposed DO Model, a metonym for its most important component, a disturbance observer, addresses key modeling requirements: modular architecture, physically relevant signals, parameters tied to atomic behaviors, and capacity for abstraction. The two main computational modules are a disturbance observer, a recently developed class of internal models, and a feedforward system that learns from the disturbance observer to improve feedforward motor commands.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013937"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phase resetting in human stem cell derived cardiomyocytes explains complex cardiac arrhythmias. 人类干细胞来源的心肌细胞的期重置解释了复杂的心律失常。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-04 DOI: 10.1371/journal.pcbi.1013935
Khady Diagne, Thomas M Bury, Morgan E Pettebone, Marc W Deyell, Zachary Laksman, Alvin Shrier, Leon Glass, Gil Bub, Emilia Entcheva

Phase resetting of cardiac oscillators underlies some complex arrhythmias. Here we use optogenetic stimulation to construct phase response curves (PRC) for spheroids of human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CM) and a computational cardiomyocyte model to identify ionic mechanisms shaping the PRC. The clinical utility of the human PRCs is demonstrated by adding a patient-based conduction delay to the same equations to explain complex multi-day Holter ECG dynamics and cardiac arrhythmias. Periodic stimulation of these patient-based models and the computational model of human iPSC-CM reveal similar bifurcation patterns and entrainment zones. Cell therapy by injecting iPSC-CM into diseased hearts can induce ectopic foci-based engraftment arrhythmias. The PRC analysis offers a potential strategy to entrain these foci in a parameter space that avoids such arrhythmias.

心律振荡的相位重置是一些复杂心律失常的基础。在这里,我们使用光遗传学刺激来构建人类诱导多能干细胞衍生的心肌细胞(hiPSC-CM)球体的相响应曲线(PRC),并使用计算心肌细胞模型来确定形成PRC的离子机制。通过在相同的方程中添加基于患者的传导延迟来解释复杂的多日动态心电图和心律失常,证明了人类prc的临床应用。这些基于患者的模型和人类iPSC-CM的计算模型的周期性刺激显示出相似的分岔模式和夹带区。通过向病变心脏注射iPSC-CM细胞治疗可诱导异位病灶性心律失常。PRC分析提供了一种潜在的策略,可以将这些焦点集中在参数空间中,从而避免此类心律失常。
{"title":"Phase resetting in human stem cell derived cardiomyocytes explains complex cardiac arrhythmias.","authors":"Khady Diagne, Thomas M Bury, Morgan E Pettebone, Marc W Deyell, Zachary Laksman, Alvin Shrier, Leon Glass, Gil Bub, Emilia Entcheva","doi":"10.1371/journal.pcbi.1013935","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013935","url":null,"abstract":"<p><p>Phase resetting of cardiac oscillators underlies some complex arrhythmias. Here we use optogenetic stimulation to construct phase response curves (PRC) for spheroids of human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CM) and a computational cardiomyocyte model to identify ionic mechanisms shaping the PRC. The clinical utility of the human PRCs is demonstrated by adding a patient-based conduction delay to the same equations to explain complex multi-day Holter ECG dynamics and cardiac arrhythmias. Periodic stimulation of these patient-based models and the computational model of human iPSC-CM reveal similar bifurcation patterns and entrainment zones. Cell therapy by injecting iPSC-CM into diseased hearts can induce ectopic foci-based engraftment arrhythmias. The PRC analysis offers a potential strategy to entrain these foci in a parameter space that avoids such arrhythmias.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013935"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TARPON-A Telomere Analysis and Research Pipeline Optimized for Nanopore. TARPON-A端粒分析和纳米孔优化研究管道。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-04 eCollection Date: 2026-02-01 DOI: 10.1371/journal.pcbi.1013915
Nathaniel Deimler, David V Ho, Norbert Paul, Zoë Gill, Peter Baumann

Long-read sequencing has transformed many areas of biology and holds significant promise for telomere research by enabling analysis of nucleotide-level resolution chromosome arm-specific telomere length in both model organisms and humans. However, the adoption of new technologies, particularly in clinical or diagnostic contexts, requires careful validation to recognize potential technical and computational limitations. We present TARPON (Telomere Analysis and Research Pipeline Optimized for Nanopore), a best-practices Nextflow pipeline designed for the analysis of telomeres sequenced on the Oxford Nanopore Technologies (ONT) platform. TARPON can be executed via the command line or integrated into ONT's EPI2ME agent, providing a user-friendly graphical interface for those without computational training. Nextflow's container-based architecture eliminates dependency conflicts, thereby streamlining deployment across platforms. TARPON isolates telomeric repeat-containing reads, assigns strand specificity, and identifies enrichment probes that can be used both for demultiplexing and for confirming capture-based library preparation. To ensure that the analysis is restricted to full-length telomeres, reads lacking a capture probe or non-telomeric sequence on the opposite end are excluded. A sliding-window approach defines the subtelomere-to-telomere boundary, followed by quality filtering to remove low-quality or subtelomeric reads that passed earlier steps. The pipeline generates customizable statistics, text-based summaries, and publication-ready visualizations (HTML, PNG, PDF). While default settings are optimized for diagnostic workflows, all parameters are easily adjustable via the GUI or command line to support diverse applications. These include telomere analyses in variant-rich samples (e.g., ALT-positive tumors) and organisms with non-canonical telomeric repeats such as some insects (GTTAG) and certain plants (GGTTTAG). TARPON is the first complete and experimentally validated pipeline for Nanopore-based telomere analysis requiring no data pre-processing or prior bioinformatics expertise, while offering flexibility for advanced users.

长读测序已经改变了生物学的许多领域,并通过分析模式生物和人类的核苷酸水平分辨率染色体臂特异性端粒长度,为端粒研究带来了重大希望。然而,新技术的采用,特别是在临床或诊断环境中,需要仔细验证,以识别潜在的技术和计算限制。我们提出了TARPON(端粒分析和研究管道优化的纳米孔),一个最佳实践Nextflow管道设计的端粒分析在牛津纳米孔技术(ONT)平台上测序。TARPON可以通过命令行执行,也可以集成到ONT的EPI2ME代理中,为没有受过计算训练的人员提供一个用户友好的图形界面。Nextflow基于容器的架构消除了依赖冲突,从而简化了跨平台的部署。TARPON分离含有端粒重复序列的reads,分配链特异性,并识别可用于解复用和确认基于捕获的文库制备的富集探针。为了确保分析仅限于全长端粒,排除了在另一端缺乏捕获探针或非端粒序列的reads。滑动窗口方法定义了亚端粒到端粒的边界,然后通过高质量过滤去除通过早期步骤的低质量或亚端粒读取。该管道生成可定制的统计数据、基于文本的摘要和可发布的可视化(HTML、PNG、PDF)。虽然默认设置针对诊断工作流程进行了优化,但所有参数都可以通过GUI或命令行轻松调整,以支持各种应用程序。这些包括对富含变异的样品(如alt阳性肿瘤)和具有非规范端粒重复的生物体(如某些昆虫(GTTAG)和某些植物(GGTTTAG)的端粒分析。TARPON是第一个完整的实验验证管道,用于基于纳米孔的端粒分析,不需要数据预处理或先前的生物信息学专业知识,同时为高级用户提供灵活性。
{"title":"TARPON-A Telomere Analysis and Research Pipeline Optimized for Nanopore.","authors":"Nathaniel Deimler, David V Ho, Norbert Paul, Zoë Gill, Peter Baumann","doi":"10.1371/journal.pcbi.1013915","DOIUrl":"10.1371/journal.pcbi.1013915","url":null,"abstract":"<p><p>Long-read sequencing has transformed many areas of biology and holds significant promise for telomere research by enabling analysis of nucleotide-level resolution chromosome arm-specific telomere length in both model organisms and humans. However, the adoption of new technologies, particularly in clinical or diagnostic contexts, requires careful validation to recognize potential technical and computational limitations. We present TARPON (Telomere Analysis and Research Pipeline Optimized for Nanopore), a best-practices Nextflow pipeline designed for the analysis of telomeres sequenced on the Oxford Nanopore Technologies (ONT) platform. TARPON can be executed via the command line or integrated into ONT's EPI2ME agent, providing a user-friendly graphical interface for those without computational training. Nextflow's container-based architecture eliminates dependency conflicts, thereby streamlining deployment across platforms. TARPON isolates telomeric repeat-containing reads, assigns strand specificity, and identifies enrichment probes that can be used both for demultiplexing and for confirming capture-based library preparation. To ensure that the analysis is restricted to full-length telomeres, reads lacking a capture probe or non-telomeric sequence on the opposite end are excluded. A sliding-window approach defines the subtelomere-to-telomere boundary, followed by quality filtering to remove low-quality or subtelomeric reads that passed earlier steps. The pipeline generates customizable statistics, text-based summaries, and publication-ready visualizations (HTML, PNG, PDF). While default settings are optimized for diagnostic workflows, all parameters are easily adjustable via the GUI or command line to support diverse applications. These include telomere analyses in variant-rich samples (e.g., ALT-positive tumors) and organisms with non-canonical telomeric repeats such as some insects (GTTAG) and certain plants (GGTTTAG). TARPON is the first complete and experimentally validated pipeline for Nanopore-based telomere analysis requiring no data pre-processing or prior bioinformatics expertise, while offering flexibility for advanced users.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013915"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12871981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SHADE: A multilevel Bayesian framework for modeling directional spatial interactions in tissue microenvironments. SHADE:用于组织微环境中定向空间相互作用建模的多层次贝叶斯框架。
IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-02-04 eCollection Date: 2026-02-01 DOI: 10.1371/journal.pcbi.1013930
Joel Eliason, Michele Peruzzi, Arvind Rao

Motivation: Understanding how different cell types interact spatially within tissue microenvironments is critical for deciphering immune dynamics, tumor progression, and tissue organization. Many current spatial analysis methods assume symmetric associations or compute image-level summaries separately without sharing information across patients and cohorts, limiting biological interpretability and statistical power.

Results: We present SHADE (Spatial Hierarchical Asymmetry via Directional Estimation), a multilevel Bayesian framework for modeling asymmetric spatial interactions across scales. SHADE quantifies direction-specific cell-cell associations using smooth spatial interaction curves (SICs) and integrates data across tissue sections, patients, and cohorts. Through simulation studies, SHADE demonstrates improved accuracy, robustness, and interpretability over existing methods. Application to colorectal cancer multiplexed imaging data demonstrates SHADE's ability to quantify directional spatial patterns while controlling for tissue architecture confounders and capturing substantial patient-level heterogeneity. The framework successfully identifies biologically interpretable spatial organization patterns, revealing that local microenvironmental structure varies considerably across patients within molecular subtypes.

动机:了解不同类型的细胞如何在组织微环境中空间相互作用,对于破译免疫动力学、肿瘤进展和组织组织至关重要。许多当前的空间分析方法假设对称关联或单独计算图像级总结,而没有在患者和队列之间共享信息,限制了生物学可解释性和统计能力。结果:我们提出了SHADE(通过方向估计的空间层次不对称),这是一个多层次贝叶斯框架,用于模拟跨尺度的不对称空间相互作用。SHADE使用平滑的空间相互作用曲线(SICs)量化方向特异性细胞-细胞关联,并整合组织切片、患者和队列的数据。通过仿真研究,与现有方法相比,SHADE证明了更高的准确性、鲁棒性和可解释性。在结直肠癌多路复用成像数据中的应用表明,SHADE能够定量定向空间模式,同时控制组织结构混杂因素,并捕获大量患者水平的异质性。该框架成功地识别了生物学上可解释的空间组织模式,揭示了不同分子亚型患者的局部微环境结构差异很大。
{"title":"SHADE: A multilevel Bayesian framework for modeling directional spatial interactions in tissue microenvironments.","authors":"Joel Eliason, Michele Peruzzi, Arvind Rao","doi":"10.1371/journal.pcbi.1013930","DOIUrl":"10.1371/journal.pcbi.1013930","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding how different cell types interact spatially within tissue microenvironments is critical for deciphering immune dynamics, tumor progression, and tissue organization. Many current spatial analysis methods assume symmetric associations or compute image-level summaries separately without sharing information across patients and cohorts, limiting biological interpretability and statistical power.</p><p><strong>Results: </strong>We present SHADE (Spatial Hierarchical Asymmetry via Directional Estimation), a multilevel Bayesian framework for modeling asymmetric spatial interactions across scales. SHADE quantifies direction-specific cell-cell associations using smooth spatial interaction curves (SICs) and integrates data across tissue sections, patients, and cohorts. Through simulation studies, SHADE demonstrates improved accuracy, robustness, and interpretability over existing methods. Application to colorectal cancer multiplexed imaging data demonstrates SHADE's ability to quantify directional spatial patterns while controlling for tissue architecture confounders and capturing substantial patient-level heterogeneity. The framework successfully identifies biologically interpretable spatial organization patterns, revealing that local microenvironmental structure varies considerably across patients within molecular subtypes.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013930"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
PLoS Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1