Pub Date : 2026-02-09DOI: 10.1371/journal.pcbi.1013909
Nikol Chantzi, Ioannis Mouratidis, Ilias Georgakopoulos-Soares
Zimin words are words that have the same prefix and suffix. They are unavoidable patterns, with all sufficiently large strings encompassing them. Here, we examine for the first time the presence of k-mers not containing any Zimin patterns, defined hereafter as Zimin avoidmers, in the human genome. We report that in the reference human genome all k-mers above 104 base-pairs contain Zimin words. We find that Zimin avoidmers are most enriched in coding and Human Satellite 1 regions in the human genome. Zimin avoidmers display a depletion of germline insertions and deletions relative to surrounding genomic areas. We also apply our methodology in the genomes of another eight model organisms from all three domains of life, finding large differences in their Zimin avoidmer frequencies and their genomic localization preferences. We observe that Zimin avoidmers exhibit the highest genomic density in prokaryotic organisms, with E. coli showing particularly high levels, while the lowest density is found in eukaryotic organisms, with D. rerio having the lowest. Among the studied genomes the longest k-mer length at which Zimin avoidmers are observed is that of S. cerevisiae at k-mer length of 115 base-pairs. We conclude that Zimin avoidmers display inhomogeneous distributions in organismal genomes, have intricate properties including lower insertion and deletion rates, and disappear faster than the theoretical expected k-mer length, across the organismal genomes studied.
{"title":"Zimin patterns in genomes.","authors":"Nikol Chantzi, Ioannis Mouratidis, Ilias Georgakopoulos-Soares","doi":"10.1371/journal.pcbi.1013909","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013909","url":null,"abstract":"<p><p>Zimin words are words that have the same prefix and suffix. They are unavoidable patterns, with all sufficiently large strings encompassing them. Here, we examine for the first time the presence of k-mers not containing any Zimin patterns, defined hereafter as Zimin avoidmers, in the human genome. We report that in the reference human genome all k-mers above 104 base-pairs contain Zimin words. We find that Zimin avoidmers are most enriched in coding and Human Satellite 1 regions in the human genome. Zimin avoidmers display a depletion of germline insertions and deletions relative to surrounding genomic areas. We also apply our methodology in the genomes of another eight model organisms from all three domains of life, finding large differences in their Zimin avoidmer frequencies and their genomic localization preferences. We observe that Zimin avoidmers exhibit the highest genomic density in prokaryotic organisms, with E. coli showing particularly high levels, while the lowest density is found in eukaryotic organisms, with D. rerio having the lowest. Among the studied genomes the longest k-mer length at which Zimin avoidmers are observed is that of S. cerevisiae at k-mer length of 115 base-pairs. We conclude that Zimin avoidmers display inhomogeneous distributions in organismal genomes, have intricate properties including lower insertion and deletion rates, and disappear faster than the theoretical expected k-mer length, across the organismal genomes studied.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013909"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1371/journal.pcbi.1013925
Ria Vinod, Ava P Amini, Lorin Crawford, Kevin K Yang
Protein language models (PLMs) pretrained via a masked language modeling objective have proven effective across a range of structure-related tasks, including high-resolution structure prediction. However, it remains unclear to what extent these models factorize protein structural categories among their learned parameters. In this work, we introduce trainable subnetworks, which mask out the PLM weights responsible for language modeling performance on a structural category of proteins. We systematically trained 39 PLM subnetworks targeting both sequence- and residue-level features at varying degrees of resolution using annotations defined by the CATH taxonomy and secondary structure elements. Using these PLM subnetworks, we assessed how structural factorization in PLMs influences downstream structure prediction. Our results show that PLMs are highly sensitive to sequence-level features and can predominantly disentangle extremely coarse or fine-grained information. Furthermore, we observe that structure prediction is highly responsive to factorized PLM representations and that small changes in language modeling performance can significantly impair PLM-based structure prediction capabilities. Our work presents a framework for studying feature entanglement within pretrained PLMs and can be leveraged to improve the alignment of learned PLM representations with known biological concepts.
{"title":"Trainable subnetworks reveal insights into structure knowledge organization in protein language models.","authors":"Ria Vinod, Ava P Amini, Lorin Crawford, Kevin K Yang","doi":"10.1371/journal.pcbi.1013925","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013925","url":null,"abstract":"<p><p>Protein language models (PLMs) pretrained via a masked language modeling objective have proven effective across a range of structure-related tasks, including high-resolution structure prediction. However, it remains unclear to what extent these models factorize protein structural categories among their learned parameters. In this work, we introduce trainable subnetworks, which mask out the PLM weights responsible for language modeling performance on a structural category of proteins. We systematically trained 39 PLM subnetworks targeting both sequence- and residue-level features at varying degrees of resolution using annotations defined by the CATH taxonomy and secondary structure elements. Using these PLM subnetworks, we assessed how structural factorization in PLMs influences downstream structure prediction. Our results show that PLMs are highly sensitive to sequence-level features and can predominantly disentangle extremely coarse or fine-grained information. Furthermore, we observe that structure prediction is highly responsive to factorized PLM representations and that small changes in language modeling performance can significantly impair PLM-based structure prediction capabilities. Our work presents a framework for studying feature entanglement within pretrained PLMs and can be leveraged to improve the alignment of learned PLM representations with known biological concepts.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013925"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1371/journal.pcbi.1013945
Cléophée Van Maele, Ségolène Caboche, Nathan Nicolau-Guillaumet, Anaëlle Muggeo, Thomas Guillard
Transposon Sequencing (Tn-Seq) is a high-throughput technique that utilizes transposon mutant libraries to assess gene fitness or essentiality under specific conditions potentially identifying novel therapeutic targets. However, the diversity of statistical methods, bioinformatics tools, and parameters complicates the selection of the most appropriate and reliable analysis pipeline for a given dataset. A significant limitation of existing studies is the absence of a gold-standard set of essential genes (EGs) for evaluating the analysis process. Relying on the original study as a gold-standard is suboptimal, as these results may have been obtained using non-optimal tools. Here, we introduce reliable EG datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses. By utilizing literature data and sequencing of six samples from PA14 Wild-Type (WT) and PA14 OprD-deficient (ΔoprD), grown in LB medium, we compared EG lists generated by several statistical methods of TRANSIT2 and by the FiTnEss tools. We established a reference dataset of 84 genes found in P. aeruginosa and another gold-standard set composed of 115 genes specific to PA14 grown in LB. Our findings revealed that depending on the analysis method used, retrieval rates of gold-standard genes ranged from 0% to 100%. The Hidden-Markov Model (HMM) method available in TRANSIT2 identified approximately 90% of gold-standard EGs, while FiTnEss identified up to 100%. This study addressed a critical gap in the field by providing gold-standard sets of EGs, enabling comparative evaluation of Tn-Seq analysis methods to help researcher select the most suitable bioinformatics pipeline for a given Tn-Seq dataset. We anticipate that our results will facilitate Tn-Seq analysis comparisons, harmonize P. aeruginosa-related studies, promote standardization and enhance reproducibility. Ultimately, this will lead to more reliable identification of EGs and potential therapeutic targets in P. aeruginosa, advancing our understanding of this important pathogen.
{"title":"Introducing gold-standard essential gene datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses.","authors":"Cléophée Van Maele, Ségolène Caboche, Nathan Nicolau-Guillaumet, Anaëlle Muggeo, Thomas Guillard","doi":"10.1371/journal.pcbi.1013945","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013945","url":null,"abstract":"<p><p>Transposon Sequencing (Tn-Seq) is a high-throughput technique that utilizes transposon mutant libraries to assess gene fitness or essentiality under specific conditions potentially identifying novel therapeutic targets. However, the diversity of statistical methods, bioinformatics tools, and parameters complicates the selection of the most appropriate and reliable analysis pipeline for a given dataset. A significant limitation of existing studies is the absence of a gold-standard set of essential genes (EGs) for evaluating the analysis process. Relying on the original study as a gold-standard is suboptimal, as these results may have been obtained using non-optimal tools. Here, we introduce reliable EG datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses. By utilizing literature data and sequencing of six samples from PA14 Wild-Type (WT) and PA14 OprD-deficient (ΔoprD), grown in LB medium, we compared EG lists generated by several statistical methods of TRANSIT2 and by the FiTnEss tools. We established a reference dataset of 84 genes found in P. aeruginosa and another gold-standard set composed of 115 genes specific to PA14 grown in LB. Our findings revealed that depending on the analysis method used, retrieval rates of gold-standard genes ranged from 0% to 100%. The Hidden-Markov Model (HMM) method available in TRANSIT2 identified approximately 90% of gold-standard EGs, while FiTnEss identified up to 100%. This study addressed a critical gap in the field by providing gold-standard sets of EGs, enabling comparative evaluation of Tn-Seq analysis methods to help researcher select the most suitable bioinformatics pipeline for a given Tn-Seq dataset. We anticipate that our results will facilitate Tn-Seq analysis comparisons, harmonize P. aeruginosa-related studies, promote standardization and enhance reproducibility. Ultimately, this will lead to more reliable identification of EGs and potential therapeutic targets in P. aeruginosa, advancing our understanding of this important pathogen.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013945"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1371/journal.pcbi.1013944
Alicia Lou, Mónica Chagoyen, Juan F Poyatos
It is widely acknowledged that development shapes phenotypes, yet the extent to which genes with similar expression patterns during development lead to equivalent organismal phenotypes when mutated remains unclear. Here, we propose addressing this issue, which we term the [Formula: see text]evelopment-to-[Formula: see text]henotype, or [Formula: see text]-[Formula: see text], rule, by leveraging single-cell gene expression atlases and phenotypic ontologies, using Caenorhabditis elegans as a model system. This framework quantifies the proportionality between developmental expression and phenotypic similarities, demonstrating that the relationship holds on average. Genes that strongly fulfill the rule exhibit broad "housekeeping" expression and are associated with systemic phenotypes, whereas weak similarities correspond to specific expression patterns and specialized phenotypes. Deviations from the [Formula: see text]-[Formula: see text] rule provide insights into developmental divergence and phenotypic degeneracy, highlighting genes with narrow functional roles but systemic phenotypic impact. Furthermore, genes that closely adhere to the rule exhibit the highest pleiotropic impact on organismal traits. Our analysis also identifies cell types, such as ASK neurons, as key mediators of phenotype-specific gene contributions, exemplified by their association with chemosensory behavior and chemotaxis. These findings validate the [Formula: see text]-[Formula: see text] rule and underscore the role of cells as critical mediators of the genotype-phenotype map, offering a unified framework to understand the developmental origins of phenotypic complexity.
{"title":"Cell atlases and the developmental foundations of the phenotype.","authors":"Alicia Lou, Mónica Chagoyen, Juan F Poyatos","doi":"10.1371/journal.pcbi.1013944","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013944","url":null,"abstract":"<p><p>It is widely acknowledged that development shapes phenotypes, yet the extent to which genes with similar expression patterns during development lead to equivalent organismal phenotypes when mutated remains unclear. Here, we propose addressing this issue, which we term the [Formula: see text]evelopment-to-[Formula: see text]henotype, or [Formula: see text]-[Formula: see text], rule, by leveraging single-cell gene expression atlases and phenotypic ontologies, using Caenorhabditis elegans as a model system. This framework quantifies the proportionality between developmental expression and phenotypic similarities, demonstrating that the relationship holds on average. Genes that strongly fulfill the rule exhibit broad \"housekeeping\" expression and are associated with systemic phenotypes, whereas weak similarities correspond to specific expression patterns and specialized phenotypes. Deviations from the [Formula: see text]-[Formula: see text] rule provide insights into developmental divergence and phenotypic degeneracy, highlighting genes with narrow functional roles but systemic phenotypic impact. Furthermore, genes that closely adhere to the rule exhibit the highest pleiotropic impact on organismal traits. Our analysis also identifies cell types, such as ASK neurons, as key mediators of phenotype-specific gene contributions, exemplified by their association with chemosensory behavior and chemotaxis. These findings validate the [Formula: see text]-[Formula: see text] rule and underscore the role of cells as critical mediators of the genotype-phenotype map, offering a unified framework to understand the developmental origins of phenotypic complexity.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013944"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1371/journal.pcbi.1013950
Weiwen Wang, Xiwen Zhang, Yuanyan Xiong
Recent advancements in computational pathology have greatly improved automated histopathological analysis. A compelling question in the field is how morphological traits are associated with genetic characteristics or molecular phenotypes. Here we propose TEMI, a novel framework for molecular subtype classification of cancers using whole-slide images (WSIs), augmented with transcriptomic data during training. TEMI aims to extract molecular-level signals from WSIs and make efficient use of available multimodal data. To this end, TEMI introduces a patch fusion network that captures dependencies among local patches of gigapixel WSIs to produce global representations and aligns them with transcriptomic embeddings attained from a masked transcriptomic autoencoder. TEMI achieves superior performance compared with existing methods in molecular subtype classification, owing to its effective integration of transcriptomic information achieved by the two developed alignment strategies. Guided by discriminative transcriptomic data, TEMI learns invariant WSI representations, while morphological features also enhance gene expression prediction. These findings suggest that histological features encode latent molecular signals, highlighting the interplay between the tumor microenvironment and cancer transcriptomics. Our study demonstrates how multimodal learning can bridge morphology and molecular biology, providing an effective tool to advance precision medicine.
{"title":"Transcriptomic-guided whole-slide image classification for molecular subtype identification.","authors":"Weiwen Wang, Xiwen Zhang, Yuanyan Xiong","doi":"10.1371/journal.pcbi.1013950","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013950","url":null,"abstract":"<p><p>Recent advancements in computational pathology have greatly improved automated histopathological analysis. A compelling question in the field is how morphological traits are associated with genetic characteristics or molecular phenotypes. Here we propose TEMI, a novel framework for molecular subtype classification of cancers using whole-slide images (WSIs), augmented with transcriptomic data during training. TEMI aims to extract molecular-level signals from WSIs and make efficient use of available multimodal data. To this end, TEMI introduces a patch fusion network that captures dependencies among local patches of gigapixel WSIs to produce global representations and aligns them with transcriptomic embeddings attained from a masked transcriptomic autoencoder. TEMI achieves superior performance compared with existing methods in molecular subtype classification, owing to its effective integration of transcriptomic information achieved by the two developed alignment strategies. Guided by discriminative transcriptomic data, TEMI learns invariant WSI representations, while morphological features also enhance gene expression prediction. These findings suggest that histological features encode latent molecular signals, highlighting the interplay between the tumor microenvironment and cancer transcriptomics. Our study demonstrates how multimodal learning can bridge morphology and molecular biology, providing an effective tool to advance precision medicine.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013950"},"PeriodicalIF":3.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1371/journal.pcbi.1013947
Yun Zuo, Chenyi Zhang, Ge Hua, Qiao Ning, Xiangrong Liu, Xiangxiang Zeng, Zhaohong Deng
<p><p>In drug discovery and therapeutic research, the prediction of drug-disease associations (DDAs) holds significant scientific and clinical value. Drug molecules exert their effects by precisely identifying disease-related biological targets, systematically modulating the entire pharmacological process from absorption, distribution, and metabolism to final efficacy. Accurate prediction of drug-disease associations not only facilitates an in-depth understanding of molecular mechanisms of drug action but also provides critical theoretical foundations for drug repositioning and personalized medicine. While traditional prediction methods based on in vitro experiments and clinical statistics yield reliable results, they suffer from inherent drawbacks such as long development cycles, substantial resource consumption, and low throughput. In contrast, emerging machine learning techniques offer a promising solution to these bottlenecks, enabling the intelligent and efficient discovery of potential drug-disease association networks and significantly improving drug development efficiency. However, it is noteworthy that existing machine learning methods still face significant challenges in practical applications: the complexity of feature construction raises the threshold for data processing; data sparsity constrains the depth of information mining; and the pervasive issue of sample imbalance poses a severe challenge to the model's predictive accuracy and generalization performance. In this study, we developed an efficient and accurate framework for drug-disease association prediction named FKSUDDAPre. The model employs a multi-modal feature fusion strategy: on one hand, it leverages an ensemble of Mol2vec and K- BERT to deeply capture the semantic features of drug molecular fingerprints; on the other hand, it integrates Medical Subject Headings (MeSH) with DeepWalk to effectively reduce the dimensionality of disease features while preserving their relational structure. To address the class imbalance problem, FKSUDDAPre designed an optimization algorithm called AMDKSU, which combined clustering with an improved distance metric strategy, significantly enhancing the discriminative power of the sample set. For data processing, F-test was employed for feature importance ranking, effectively reducing data dimensionality and improving model generalization. For the predictive architecture, FKSUDDAPre proposed a novel ensemble framework composed of XGBoost, Decision Tree, Random Forest, and HyperFast. By employing a dynamic weight allocation strategy, this ensemble effectively harnesses the complementary strengths of these models to achieve significantly enhanced predictive performance. Rigorous validation demonstrated the system's outstanding performance across multiple evaluation metrics, with an average AUC of 0.9725, improving the AUC by approximately 3.88% compared to the best-performing baseline model. In the prediction of Alzheimer's disease and Parkinson'
{"title":"FKSUDDAPre: A drug-disease association prediction framework based on F-TEST feature selection and AMDKSU resampling with interpretability analysis.","authors":"Yun Zuo, Chenyi Zhang, Ge Hua, Qiao Ning, Xiangrong Liu, Xiangxiang Zeng, Zhaohong Deng","doi":"10.1371/journal.pcbi.1013947","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013947","url":null,"abstract":"<p><p>In drug discovery and therapeutic research, the prediction of drug-disease associations (DDAs) holds significant scientific and clinical value. Drug molecules exert their effects by precisely identifying disease-related biological targets, systematically modulating the entire pharmacological process from absorption, distribution, and metabolism to final efficacy. Accurate prediction of drug-disease associations not only facilitates an in-depth understanding of molecular mechanisms of drug action but also provides critical theoretical foundations for drug repositioning and personalized medicine. While traditional prediction methods based on in vitro experiments and clinical statistics yield reliable results, they suffer from inherent drawbacks such as long development cycles, substantial resource consumption, and low throughput. In contrast, emerging machine learning techniques offer a promising solution to these bottlenecks, enabling the intelligent and efficient discovery of potential drug-disease association networks and significantly improving drug development efficiency. However, it is noteworthy that existing machine learning methods still face significant challenges in practical applications: the complexity of feature construction raises the threshold for data processing; data sparsity constrains the depth of information mining; and the pervasive issue of sample imbalance poses a severe challenge to the model's predictive accuracy and generalization performance. In this study, we developed an efficient and accurate framework for drug-disease association prediction named FKSUDDAPre. The model employs a multi-modal feature fusion strategy: on one hand, it leverages an ensemble of Mol2vec and K- BERT to deeply capture the semantic features of drug molecular fingerprints; on the other hand, it integrates Medical Subject Headings (MeSH) with DeepWalk to effectively reduce the dimensionality of disease features while preserving their relational structure. To address the class imbalance problem, FKSUDDAPre designed an optimization algorithm called AMDKSU, which combined clustering with an improved distance metric strategy, significantly enhancing the discriminative power of the sample set. For data processing, F-test was employed for feature importance ranking, effectively reducing data dimensionality and improving model generalization. For the predictive architecture, FKSUDDAPre proposed a novel ensemble framework composed of XGBoost, Decision Tree, Random Forest, and HyperFast. By employing a dynamic weight allocation strategy, this ensemble effectively harnesses the complementary strengths of these models to achieve significantly enhanced predictive performance. Rigorous validation demonstrated the system's outstanding performance across multiple evaluation metrics, with an average AUC of 0.9725, improving the AUC by approximately 3.88% compared to the best-performing baseline model. In the prediction of Alzheimer's disease and Parkinson'","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013947"},"PeriodicalIF":3.6,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1371/journal.pcbi.1013937
Gaurav Sharma, Bernard Marius 't Hart, Jean-Jacques Orban de Xivry, Denise Y P Henriques, Mireille E Broucke
A fundamental problem of visuomotor adaptation research is to understand how the brain is capable to asymptotically remove a predictable exogenous disturbance from a visual error signal using limited sensor information by re-calibration of hand movement. From a control theory perspective, the most striking aspect of this problem is that it falls squarely in the realm of the internal model principle of control theory. Despite this fact, the relationship between the internal model principle and models of visuomotor adaptation is currently not well developed. This paper aims to close this gap by proposing an abstract discrete-time state space model of visuomotor adaptation based on the internal model principle. The proposed DO Model, a metonym for its most important component, a disturbance observer, addresses key modeling requirements: modular architecture, physically relevant signals, parameters tied to atomic behaviors, and capacity for abstraction. The two main computational modules are a disturbance observer, a recently developed class of internal models, and a feedforward system that learns from the disturbance observer to improve feedforward motor commands.
{"title":"Modeling human visuomotor adaptation with a disturbance observer framework.","authors":"Gaurav Sharma, Bernard Marius 't Hart, Jean-Jacques Orban de Xivry, Denise Y P Henriques, Mireille E Broucke","doi":"10.1371/journal.pcbi.1013937","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013937","url":null,"abstract":"<p><p>A fundamental problem of visuomotor adaptation research is to understand how the brain is capable to asymptotically remove a predictable exogenous disturbance from a visual error signal using limited sensor information by re-calibration of hand movement. From a control theory perspective, the most striking aspect of this problem is that it falls squarely in the realm of the internal model principle of control theory. Despite this fact, the relationship between the internal model principle and models of visuomotor adaptation is currently not well developed. This paper aims to close this gap by proposing an abstract discrete-time state space model of visuomotor adaptation based on the internal model principle. The proposed DO Model, a metonym for its most important component, a disturbance observer, addresses key modeling requirements: modular architecture, physically relevant signals, parameters tied to atomic behaviors, and capacity for abstraction. The two main computational modules are a disturbance observer, a recently developed class of internal models, and a feedforward system that learns from the disturbance observer to improve feedforward motor commands.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013937"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1371/journal.pcbi.1013935
Khady Diagne, Thomas M Bury, Morgan E Pettebone, Marc W Deyell, Zachary Laksman, Alvin Shrier, Leon Glass, Gil Bub, Emilia Entcheva
Phase resetting of cardiac oscillators underlies some complex arrhythmias. Here we use optogenetic stimulation to construct phase response curves (PRC) for spheroids of human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CM) and a computational cardiomyocyte model to identify ionic mechanisms shaping the PRC. The clinical utility of the human PRCs is demonstrated by adding a patient-based conduction delay to the same equations to explain complex multi-day Holter ECG dynamics and cardiac arrhythmias. Periodic stimulation of these patient-based models and the computational model of human iPSC-CM reveal similar bifurcation patterns and entrainment zones. Cell therapy by injecting iPSC-CM into diseased hearts can induce ectopic foci-based engraftment arrhythmias. The PRC analysis offers a potential strategy to entrain these foci in a parameter space that avoids such arrhythmias.
{"title":"Phase resetting in human stem cell derived cardiomyocytes explains complex cardiac arrhythmias.","authors":"Khady Diagne, Thomas M Bury, Morgan E Pettebone, Marc W Deyell, Zachary Laksman, Alvin Shrier, Leon Glass, Gil Bub, Emilia Entcheva","doi":"10.1371/journal.pcbi.1013935","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1013935","url":null,"abstract":"<p><p>Phase resetting of cardiac oscillators underlies some complex arrhythmias. Here we use optogenetic stimulation to construct phase response curves (PRC) for spheroids of human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CM) and a computational cardiomyocyte model to identify ionic mechanisms shaping the PRC. The clinical utility of the human PRCs is demonstrated by adding a patient-based conduction delay to the same equations to explain complex multi-day Holter ECG dynamics and cardiac arrhythmias. Periodic stimulation of these patient-based models and the computational model of human iPSC-CM reveal similar bifurcation patterns and entrainment zones. Cell therapy by injecting iPSC-CM into diseased hearts can induce ectopic foci-based engraftment arrhythmias. The PRC analysis offers a potential strategy to entrain these foci in a parameter space that avoids such arrhythmias.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013935"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04eCollection Date: 2026-02-01DOI: 10.1371/journal.pcbi.1013915
Nathaniel Deimler, David V Ho, Norbert Paul, Zoë Gill, Peter Baumann
Long-read sequencing has transformed many areas of biology and holds significant promise for telomere research by enabling analysis of nucleotide-level resolution chromosome arm-specific telomere length in both model organisms and humans. However, the adoption of new technologies, particularly in clinical or diagnostic contexts, requires careful validation to recognize potential technical and computational limitations. We present TARPON (Telomere Analysis and Research Pipeline Optimized for Nanopore), a best-practices Nextflow pipeline designed for the analysis of telomeres sequenced on the Oxford Nanopore Technologies (ONT) platform. TARPON can be executed via the command line or integrated into ONT's EPI2ME agent, providing a user-friendly graphical interface for those without computational training. Nextflow's container-based architecture eliminates dependency conflicts, thereby streamlining deployment across platforms. TARPON isolates telomeric repeat-containing reads, assigns strand specificity, and identifies enrichment probes that can be used both for demultiplexing and for confirming capture-based library preparation. To ensure that the analysis is restricted to full-length telomeres, reads lacking a capture probe or non-telomeric sequence on the opposite end are excluded. A sliding-window approach defines the subtelomere-to-telomere boundary, followed by quality filtering to remove low-quality or subtelomeric reads that passed earlier steps. The pipeline generates customizable statistics, text-based summaries, and publication-ready visualizations (HTML, PNG, PDF). While default settings are optimized for diagnostic workflows, all parameters are easily adjustable via the GUI or command line to support diverse applications. These include telomere analyses in variant-rich samples (e.g., ALT-positive tumors) and organisms with non-canonical telomeric repeats such as some insects (GTTAG) and certain plants (GGTTTAG). TARPON is the first complete and experimentally validated pipeline for Nanopore-based telomere analysis requiring no data pre-processing or prior bioinformatics expertise, while offering flexibility for advanced users.
{"title":"TARPON-A Telomere Analysis and Research Pipeline Optimized for Nanopore.","authors":"Nathaniel Deimler, David V Ho, Norbert Paul, Zoë Gill, Peter Baumann","doi":"10.1371/journal.pcbi.1013915","DOIUrl":"10.1371/journal.pcbi.1013915","url":null,"abstract":"<p><p>Long-read sequencing has transformed many areas of biology and holds significant promise for telomere research by enabling analysis of nucleotide-level resolution chromosome arm-specific telomere length in both model organisms and humans. However, the adoption of new technologies, particularly in clinical or diagnostic contexts, requires careful validation to recognize potential technical and computational limitations. We present TARPON (Telomere Analysis and Research Pipeline Optimized for Nanopore), a best-practices Nextflow pipeline designed for the analysis of telomeres sequenced on the Oxford Nanopore Technologies (ONT) platform. TARPON can be executed via the command line or integrated into ONT's EPI2ME agent, providing a user-friendly graphical interface for those without computational training. Nextflow's container-based architecture eliminates dependency conflicts, thereby streamlining deployment across platforms. TARPON isolates telomeric repeat-containing reads, assigns strand specificity, and identifies enrichment probes that can be used both for demultiplexing and for confirming capture-based library preparation. To ensure that the analysis is restricted to full-length telomeres, reads lacking a capture probe or non-telomeric sequence on the opposite end are excluded. A sliding-window approach defines the subtelomere-to-telomere boundary, followed by quality filtering to remove low-quality or subtelomeric reads that passed earlier steps. The pipeline generates customizable statistics, text-based summaries, and publication-ready visualizations (HTML, PNG, PDF). While default settings are optimized for diagnostic workflows, all parameters are easily adjustable via the GUI or command line to support diverse applications. These include telomere analyses in variant-rich samples (e.g., ALT-positive tumors) and organisms with non-canonical telomeric repeats such as some insects (GTTAG) and certain plants (GGTTTAG). TARPON is the first complete and experimentally validated pipeline for Nanopore-based telomere analysis requiring no data pre-processing or prior bioinformatics expertise, while offering flexibility for advanced users.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013915"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12871981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04eCollection Date: 2026-02-01DOI: 10.1371/journal.pcbi.1013930
Joel Eliason, Michele Peruzzi, Arvind Rao
Motivation: Understanding how different cell types interact spatially within tissue microenvironments is critical for deciphering immune dynamics, tumor progression, and tissue organization. Many current spatial analysis methods assume symmetric associations or compute image-level summaries separately without sharing information across patients and cohorts, limiting biological interpretability and statistical power.
Results: We present SHADE (Spatial Hierarchical Asymmetry via Directional Estimation), a multilevel Bayesian framework for modeling asymmetric spatial interactions across scales. SHADE quantifies direction-specific cell-cell associations using smooth spatial interaction curves (SICs) and integrates data across tissue sections, patients, and cohorts. Through simulation studies, SHADE demonstrates improved accuracy, robustness, and interpretability over existing methods. Application to colorectal cancer multiplexed imaging data demonstrates SHADE's ability to quantify directional spatial patterns while controlling for tissue architecture confounders and capturing substantial patient-level heterogeneity. The framework successfully identifies biologically interpretable spatial organization patterns, revealing that local microenvironmental structure varies considerably across patients within molecular subtypes.
{"title":"SHADE: A multilevel Bayesian framework for modeling directional spatial interactions in tissue microenvironments.","authors":"Joel Eliason, Michele Peruzzi, Arvind Rao","doi":"10.1371/journal.pcbi.1013930","DOIUrl":"10.1371/journal.pcbi.1013930","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding how different cell types interact spatially within tissue microenvironments is critical for deciphering immune dynamics, tumor progression, and tissue organization. Many current spatial analysis methods assume symmetric associations or compute image-level summaries separately without sharing information across patients and cohorts, limiting biological interpretability and statistical power.</p><p><strong>Results: </strong>We present SHADE (Spatial Hierarchical Asymmetry via Directional Estimation), a multilevel Bayesian framework for modeling asymmetric spatial interactions across scales. SHADE quantifies direction-specific cell-cell associations using smooth spatial interaction curves (SICs) and integrates data across tissue sections, patients, and cohorts. Through simulation studies, SHADE demonstrates improved accuracy, robustness, and interpretability over existing methods. Application to colorectal cancer multiplexed imaging data demonstrates SHADE's ability to quantify directional spatial patterns while controlling for tissue architecture confounders and capturing substantial patient-level heterogeneity. The framework successfully identifies biologically interpretable spatial organization patterns, revealing that local microenvironmental structure varies considerably across patients within molecular subtypes.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013930"},"PeriodicalIF":3.6,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}