Pub Date : 2026-02-02eCollection Date: 2026-02-01DOI: 10.1371/journal.pcbi.1013879
Sean R Maulhardt, Alec Solway, Caroline J Charpentier
When receiving a reward after a sequence of multiple events, how do we determine which event caused the reward? This problem, known as temporal credit assignment, can be difficult for humans to solve given the temporal uncertainty in the environment. Research to date has attempted to isolate dimensions of delay and reward during decision-making, but algorithmic solutions to temporal learning problems and the effect of uncertainty on learning remain underexplored. To further our understanding, we adapted a reward learning task that creates a temporal credit assignment problem by combining sequentially delayed rewards, intervening events, and varying uncertainty via the amount of information presented during feedback. Using computational modeling, two learning strategies were developed: an eligibility trace, whereby previously selected actions are updated as a function of the temporal sequence, and a tabular update, whereby only systematically related past actions (rather than unrelated intervening events) are updated. We hypothesized that reduced information uncertainty would correlate with increased use of the tabular strategy, given the model's capacity to incorporate additional feedback information. Both models effectively learned the task, and predicted choices made by participants (N = 142) as well as specific behavioral signatures of credit assignment. Consistent with our hypothesis, the tabular model outperformed the eligibility model under low information uncertainty, as evidenced by more accurate predictions of participants' behavior and an increase in tabular weight. These findings provide new insights into the mechanisms implemented by humans to solve temporal credit assignment and adapt their strategy in varying environments.
{"title":"Information uncertainty influences learning strategy from sequentially delayed rewards.","authors":"Sean R Maulhardt, Alec Solway, Caroline J Charpentier","doi":"10.1371/journal.pcbi.1013879","DOIUrl":"10.1371/journal.pcbi.1013879","url":null,"abstract":"<p><p>When receiving a reward after a sequence of multiple events, how do we determine which event caused the reward? This problem, known as temporal credit assignment, can be difficult for humans to solve given the temporal uncertainty in the environment. Research to date has attempted to isolate dimensions of delay and reward during decision-making, but algorithmic solutions to temporal learning problems and the effect of uncertainty on learning remain underexplored. To further our understanding, we adapted a reward learning task that creates a temporal credit assignment problem by combining sequentially delayed rewards, intervening events, and varying uncertainty via the amount of information presented during feedback. Using computational modeling, two learning strategies were developed: an eligibility trace, whereby previously selected actions are updated as a function of the temporal sequence, and a tabular update, whereby only systematically related past actions (rather than unrelated intervening events) are updated. We hypothesized that reduced information uncertainty would correlate with increased use of the tabular strategy, given the model's capacity to incorporate additional feedback information. Both models effectively learned the task, and predicted choices made by participants (N = 142) as well as specific behavioral signatures of credit assignment. Consistent with our hypothesis, the tabular model outperformed the eligibility model under low information uncertainty, as evidenced by more accurate predictions of participants' behavior and an increase in tabular weight. These findings provide new insights into the mechanisms implemented by humans to solve temporal credit assignment and adapt their strategy in varying environments.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013879"},"PeriodicalIF":3.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885371/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146106951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02eCollection Date: 2026-02-01DOI: 10.1371/journal.pcbi.1013923
Yang Zhou, Shahab Aslani, Yousef Javanmardi, Joseph Brunet, David Stansby, Saskia Carroll, Alexandre Bellier, Maximilian Ackermann, Paul Tafforeau, Peter D Lee, Claire L Walsh
Biomedical systems span multiple spatial scales, encompassing tiny functional units to entire organs. Interpreting these systems through image segmentation requires the effective propagation and integration of information across different scales. However, most existing segmentation methods are optimised for single-scale imaging modalities, limiting their ability to capture and analyse small functional units throughout complete human organs. To facilitate multiscale biomedical image segmentation, we utilised Hierarchical Phase-Contrast Tomography (HiP-CT), an advanced imaging modality that can generate 3D multiscale datasets from high-resolution volumes of interest (VOIs) at ca. 1 [Formula: see text]/voxel to whole-organ scans at ca. 20 [Formula: see text]/voxel. Building on these hierarchical multiscale datasets, we developed a deep learning-based segmentation pipeline that is initially trained on manually annotated high-resolution HiP-CT data and then extended to lower-resolution whole-organ scans using pseudo-labels generated from high-resolution predictions and multiscale image registration. As a case study, we focused on glomeruli in human kidneys, benchmarking four 3D deep learning models for biomedical image segmentation on a manually annotated high-resolution dataset extracted from VOIs, at 2.58 to ca. 5 [Formula: see text]/voxel, of four human kidneys. Among them, nnUNet demonstrated the best performance, achieving an average test Dice score of 0.906, and was subsequently used as the baseline model for multiscale segmentation in the pipeline. Applying this pipeline to two low-resolution full-organ data at ca. 25 [Formula: see text]/voxel, the model identified 1,019,890 and 231,179 glomeruli in a 62-year-old donor without kidney diseases and a 94-year-old hypertensive donor, enabling comprehensive morphological analyses, including cortical spatial statistics and glomerular distributions, which aligned well with previous anatomical studies. Our results highlight the effectiveness of the proposed pipeline for segmenting small functional units in multiscale bioimaging datasets and suggest its broader applicability to other organ systems.
{"title":"Multiscale segmentation using hierarchical phase-contrast tomography and deep learning.","authors":"Yang Zhou, Shahab Aslani, Yousef Javanmardi, Joseph Brunet, David Stansby, Saskia Carroll, Alexandre Bellier, Maximilian Ackermann, Paul Tafforeau, Peter D Lee, Claire L Walsh","doi":"10.1371/journal.pcbi.1013923","DOIUrl":"10.1371/journal.pcbi.1013923","url":null,"abstract":"<p><p>Biomedical systems span multiple spatial scales, encompassing tiny functional units to entire organs. Interpreting these systems through image segmentation requires the effective propagation and integration of information across different scales. However, most existing segmentation methods are optimised for single-scale imaging modalities, limiting their ability to capture and analyse small functional units throughout complete human organs. To facilitate multiscale biomedical image segmentation, we utilised Hierarchical Phase-Contrast Tomography (HiP-CT), an advanced imaging modality that can generate 3D multiscale datasets from high-resolution volumes of interest (VOIs) at ca. 1 [Formula: see text]/voxel to whole-organ scans at ca. 20 [Formula: see text]/voxel. Building on these hierarchical multiscale datasets, we developed a deep learning-based segmentation pipeline that is initially trained on manually annotated high-resolution HiP-CT data and then extended to lower-resolution whole-organ scans using pseudo-labels generated from high-resolution predictions and multiscale image registration. As a case study, we focused on glomeruli in human kidneys, benchmarking four 3D deep learning models for biomedical image segmentation on a manually annotated high-resolution dataset extracted from VOIs, at 2.58 to ca. 5 [Formula: see text]/voxel, of four human kidneys. Among them, nnUNet demonstrated the best performance, achieving an average test Dice score of 0.906, and was subsequently used as the baseline model for multiscale segmentation in the pipeline. Applying this pipeline to two low-resolution full-organ data at ca. 25 [Formula: see text]/voxel, the model identified 1,019,890 and 231,179 glomeruli in a 62-year-old donor without kidney diseases and a 94-year-old hypertensive donor, enabling comprehensive morphological analyses, including cortical spatial statistics and glomerular distributions, which aligned well with previous anatomical studies. Our results highlight the effectiveness of the proposed pipeline for segmenting small functional units in multiscale bioimaging datasets and suggest its broader applicability to other organ systems.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013923"},"PeriodicalIF":3.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12880754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146107023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02eCollection Date: 2026-02-01DOI: 10.1371/journal.pcbi.1013194
Emily Liu, Jiaqi Zhang, Caroline Uhler
Advances in sequencing technologies have enhanced the understanding of gene regulation in cells. In particular, Perturb-seq has enabled high-resolution profiling of the transcriptomic response to genetic perturbations at the single-cell level. This understanding has implications in functional genomics and potentially for identifying therapeutic targets. Various computational models have been developed to predict perturbational effects. While deep learning models excel at interpolating observed perturbational data, they tend to overfit in the lack of enough data and may not generalize well to unseen perturbations. In contrast, mechanistic models, such as linear causal models based on gene regulatory networks, hold greater potential for extrapolation, as they encapsulate regulatory information that can predict responses to unseen perturbations. However, their application has been limited to small studies due to overly simplistic assumptions, making them less effective in handling noisy, large-scale single-cell data. We propose a hybrid approach that combines a mechanistic causal model with variational deep learning, termed Single Cell Causal Variational Autoencoder (SCCVAE). The mechanistic model employs a learned regulatory network to represent perturbational changes as shift interventions that propagate through the learned network. SCCVAE integrates this mechanistic causal model into a variational autoencoder, generating rich, comprehensive transcriptomic responses. Our results indicate that SCCVAE exhibits superior performance over current state-of-the-art baselines for extrapolating to predict unseen perturbational responses. Additionally, for the observed perturbations, the latent space learned by SCCVAE allows for the identification of functional perturbation modules and simulation of single-gene knockdown experiments of varying penetrance, presenting a robust tool for interpreting and interpolating perturbational responses at the single-cell level.
{"title":"Learning genetic perturbation effects with variational causal inference.","authors":"Emily Liu, Jiaqi Zhang, Caroline Uhler","doi":"10.1371/journal.pcbi.1013194","DOIUrl":"10.1371/journal.pcbi.1013194","url":null,"abstract":"<p><p>Advances in sequencing technologies have enhanced the understanding of gene regulation in cells. In particular, Perturb-seq has enabled high-resolution profiling of the transcriptomic response to genetic perturbations at the single-cell level. This understanding has implications in functional genomics and potentially for identifying therapeutic targets. Various computational models have been developed to predict perturbational effects. While deep learning models excel at interpolating observed perturbational data, they tend to overfit in the lack of enough data and may not generalize well to unseen perturbations. In contrast, mechanistic models, such as linear causal models based on gene regulatory networks, hold greater potential for extrapolation, as they encapsulate regulatory information that can predict responses to unseen perturbations. However, their application has been limited to small studies due to overly simplistic assumptions, making them less effective in handling noisy, large-scale single-cell data. We propose a hybrid approach that combines a mechanistic causal model with variational deep learning, termed Single Cell Causal Variational Autoencoder (SCCVAE). The mechanistic model employs a learned regulatory network to represent perturbational changes as shift interventions that propagate through the learned network. SCCVAE integrates this mechanistic causal model into a variational autoencoder, generating rich, comprehensive transcriptomic responses. Our results indicate that SCCVAE exhibits superior performance over current state-of-the-art baselines for extrapolating to predict unseen perturbational responses. Additionally, for the observed perturbations, the latent space learned by SCCVAE allows for the identification of functional perturbation modules and simulation of single-gene knockdown experiments of varying penetrance, presenting a robust tool for interpreting and interpolating perturbational responses at the single-cell level.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013194"},"PeriodicalIF":3.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146106998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02eCollection Date: 2026-02-01DOI: 10.1371/journal.pcbi.1013918
Loïc Marrec, Sonja Lehtinen
Identifying the drivers of diversity remains a central challenge in microbial ecology. In microbiota, within-community diversity is often linked to host health, which makes it all the more important to understand. Since many communities assemble de novo, microbial dispersal plays a critical role in shaping community structure during the early stages of assembly. While theoretical models typically assume microbes disperse individually, this overlooks cases where microbes disperse in clusters, such as, for example, during host feeding. Here, we investigate how cluster dispersal impacts species richness, between-community dissimilarity, and species abundance in the initial steps of microbial community assembly. We developed a model in which microbes disperse from a pool into communities as clusters and then replicate locally. Using both analytical and numerical approaches, we show that cluster dispersal promotes community homogenization by increasing within-community richness and reducing dissimilarity across communities, even at low dispersal rates. Moreover, it modulates the influence of local selection on microbial community assembly and, consequently, on species abundance. Our results demonstrate that cluster dispersal has distinct effects from simply increasing the dispersal rate. This work reveals new evidence for the role of cluster dispersal in the early dynamics of microbial community assembly.
{"title":"Cluster dispersal shapes microbial diversity during community assembly.","authors":"Loïc Marrec, Sonja Lehtinen","doi":"10.1371/journal.pcbi.1013918","DOIUrl":"10.1371/journal.pcbi.1013918","url":null,"abstract":"<p><p>Identifying the drivers of diversity remains a central challenge in microbial ecology. In microbiota, within-community diversity is often linked to host health, which makes it all the more important to understand. Since many communities assemble de novo, microbial dispersal plays a critical role in shaping community structure during the early stages of assembly. While theoretical models typically assume microbes disperse individually, this overlooks cases where microbes disperse in clusters, such as, for example, during host feeding. Here, we investigate how cluster dispersal impacts species richness, between-community dissimilarity, and species abundance in the initial steps of microbial community assembly. We developed a model in which microbes disperse from a pool into communities as clusters and then replicate locally. Using both analytical and numerical approaches, we show that cluster dispersal promotes community homogenization by increasing within-community richness and reducing dissimilarity across communities, even at low dispersal rates. Moreover, it modulates the influence of local selection on microbial community assembly and, consequently, on species abundance. Our results demonstrate that cluster dispersal has distinct effects from simply increasing the dispersal rate. This work reveals new evidence for the role of cluster dispersal in the early dynamics of microbial community assembly.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013918"},"PeriodicalIF":3.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885383/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146106981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02eCollection Date: 2026-02-01DOI: 10.1371/journal.pcbi.1013888
Anna Shafer-Skelton, Timothy F Brady, John T Serences
Understanding 3D representations of spatial information, particularly in naturalistic scenes, remains a significant challenge in vision science. This is largely because of conceptual difficulties in disentangling higher-level 3D information from co-occurring features and cues (e.g., the 3D shape of a scene image is necessarily defined by "low-level" spatial frequency and orientation information). Recent work has employed newer models and analysis techniques that attempt to mitigate these difficulties within a model-comparison framework. For example, one such study reported 3D-surface features were uniquely present in areas OPA, PPA, and MPA/RSC (areas typically referred to as 'scene-selective'), above and beyond a Gabor-wavelet baseline model. Here, we tested whether these findings generalized to a new stimulus set that, on average, dissociated static Gabor-wavelet baseline features from 3D scene-surface features. Surprisingly, we found evidence that a Gabor-wavelet baseline model-commonly thought of as a "low-level" or "2D" model-better fit voxel responses in areas OPA, PPA and MPA/RSC compared to a model with 3D-surface information. We highlight that this difference in results could be due to differences in the baseline conditions used across studies. These findings emphasize that much of the information in "scene-selective" regions-potentially even information about 3D surfaces-may be in the form of spatial frequency and orientation information often considered 2D or low-level. Disentangling lower-level and higher-level visual information is a continuing fundamental challenge for model-comparison approaches in visual cognition, and it motivates future work investigating which visual features could cue higher-level properties in our real-world visual experience-both within and beyond current model comparison frameworks.
{"title":"A 2D Gabor-wavelet baseline model out-performs a 3D surface model in scene-responsive cortex.","authors":"Anna Shafer-Skelton, Timothy F Brady, John T Serences","doi":"10.1371/journal.pcbi.1013888","DOIUrl":"10.1371/journal.pcbi.1013888","url":null,"abstract":"<p><p>Understanding 3D representations of spatial information, particularly in naturalistic scenes, remains a significant challenge in vision science. This is largely because of conceptual difficulties in disentangling higher-level 3D information from co-occurring features and cues (e.g., the 3D shape of a scene image is necessarily defined by \"low-level\" spatial frequency and orientation information). Recent work has employed newer models and analysis techniques that attempt to mitigate these difficulties within a model-comparison framework. For example, one such study reported 3D-surface features were uniquely present in areas OPA, PPA, and MPA/RSC (areas typically referred to as 'scene-selective'), above and beyond a Gabor-wavelet baseline model. Here, we tested whether these findings generalized to a new stimulus set that, on average, dissociated static Gabor-wavelet baseline features from 3D scene-surface features. Surprisingly, we found evidence that a Gabor-wavelet baseline model-commonly thought of as a \"low-level\" or \"2D\" model-better fit voxel responses in areas OPA, PPA and MPA/RSC compared to a model with 3D-surface information. We highlight that this difference in results could be due to differences in the baseline conditions used across studies. These findings emphasize that much of the information in \"scene-selective\" regions-potentially even information about 3D surfaces-may be in the form of spatial frequency and orientation information often considered 2D or low-level. Disentangling lower-level and higher-level visual information is a continuing fundamental challenge for model-comparison approaches in visual cognition, and it motivates future work investigating which visual features could cue higher-level properties in our real-world visual experience-both within and beyond current model comparison frameworks.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 2","pages":"e1013888"},"PeriodicalIF":3.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12880747/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146106966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30eCollection Date: 2026-01-01DOI: 10.1371/journal.pcbi.1013931
Verna Heikkinen, Susanne Merz, Riitta Salmelin, Sampsa Vanhatalo, Leena Lauronen, Mia Liljeström, Hanna Renvall
Human brain dynamics are highly unique between individuals: functional neuroimaging studies have recently described functional features that can be used as neural fingerprints. However, the stability of these fingerprints is affected by aging and disease. As such, the stability of brain fingerprints may be a useful metric when studying normal and pathological neurodevelopment. Before examining clinically relevant deviations, the individual stability and variation of neuroimaging features across brain maturation in normally developing children need to be addressed with real clinical data. Here we applied Bayesian reduced-rank regression (BRRR) to extract low-dimensional representations of electroencephalography (EEG) power spectra measured during different non-REM sleep stages (N1 and N2) from 782 normally developing children aged between 6 weeks to 19 years. The representations learned within specific sleep stages successfully separated between subjects and generalized across sleep stages. Fingerprint stability increased with the age of the subjects. Compared to correlation-based fingerprinting methods, the BRRR model performed better, especially in fingerprinting across sleep stages, highlighting the usefulness of dimensionality reduction when the noise and signal of interest are correlated. While further studies are needed to address the possible non-linear maturation effects over developmental periods, our results demonstrate the existence of stable within-session neurofunctional fingerprints in pediatric populations.
{"title":"Capturing individual variation in children's electroencephalograms during nREM sleep.","authors":"Verna Heikkinen, Susanne Merz, Riitta Salmelin, Sampsa Vanhatalo, Leena Lauronen, Mia Liljeström, Hanna Renvall","doi":"10.1371/journal.pcbi.1013931","DOIUrl":"10.1371/journal.pcbi.1013931","url":null,"abstract":"<p><p>Human brain dynamics are highly unique between individuals: functional neuroimaging studies have recently described functional features that can be used as neural fingerprints. However, the stability of these fingerprints is affected by aging and disease. As such, the stability of brain fingerprints may be a useful metric when studying normal and pathological neurodevelopment. Before examining clinically relevant deviations, the individual stability and variation of neuroimaging features across brain maturation in normally developing children need to be addressed with real clinical data. Here we applied Bayesian reduced-rank regression (BRRR) to extract low-dimensional representations of electroencephalography (EEG) power spectra measured during different non-REM sleep stages (N1 and N2) from 782 normally developing children aged between 6 weeks to 19 years. The representations learned within specific sleep stages successfully separated between subjects and generalized across sleep stages. Fingerprint stability increased with the age of the subjects. Compared to correlation-based fingerprinting methods, the BRRR model performed better, especially in fingerprinting across sleep stages, highlighting the usefulness of dimensionality reduction when the noise and signal of interest are correlated. While further studies are needed to address the possible non-linear maturation effects over developmental periods, our results demonstrate the existence of stable within-session neurofunctional fingerprints in pediatric populations.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 1","pages":"e1013931"},"PeriodicalIF":3.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146093860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30eCollection Date: 2026-01-01DOI: 10.1371/journal.pcbi.1013933
Jian Miao, Dawei Li
Transposable element (TE) variants, the presence or absence of TE sequences such as LINE-1, Alu, SVA, and endogenous retroviruses, are a major source of genomic diversity and play critical roles in human health, evolution, and disease. As interest in TE variants grows, developing related methods and tools for detection has become increasingly important. However, rigorous benchmarking of TE variant detection methods remains limited due to the lack of accurate and scalable TE variant simulation platforms and the absence of reliable ground truth data. Here, we developed TEvarSim, a novel TE variant simulator that generates TE-containing genomic data in multiple formats, including genomes, short- and long-read sequencing data, and VCF files. TEvarSim supports both random and real-world TE insertions and deletions, including variants derived from pangenome graphs. It can rapidly simulate hundreds to thousands of synthetic chromosomes or genomes and model natural variation at the haplotype, individual, and population levels, making it well suited for large-scale studies. In addition, TEvarSim can directly compare simulated VCF files with TEs reported by TE detection tools, streamlining the benchmarking of TE genotyping methods. TEvarSim provides an all-in-one toolkit for simulating, evaluating, and improving TE variant detection, advancing our ability to accurately study TEs in health and disease in various species.
{"title":"TEvarSim: A genome simulator for transposable element (TE) variants.","authors":"Jian Miao, Dawei Li","doi":"10.1371/journal.pcbi.1013933","DOIUrl":"10.1371/journal.pcbi.1013933","url":null,"abstract":"<p><p>Transposable element (TE) variants, the presence or absence of TE sequences such as LINE-1, Alu, SVA, and endogenous retroviruses, are a major source of genomic diversity and play critical roles in human health, evolution, and disease. As interest in TE variants grows, developing related methods and tools for detection has become increasingly important. However, rigorous benchmarking of TE variant detection methods remains limited due to the lack of accurate and scalable TE variant simulation platforms and the absence of reliable ground truth data. Here, we developed TEvarSim, a novel TE variant simulator that generates TE-containing genomic data in multiple formats, including genomes, short- and long-read sequencing data, and VCF files. TEvarSim supports both random and real-world TE insertions and deletions, including variants derived from pangenome graphs. It can rapidly simulate hundreds to thousands of synthetic chromosomes or genomes and model natural variation at the haplotype, individual, and population levels, making it well suited for large-scale studies. In addition, TEvarSim can directly compare simulated VCF files with TEs reported by TE detection tools, streamlining the benchmarking of TE genotyping methods. TEvarSim provides an all-in-one toolkit for simulating, evaluating, and improving TE variant detection, advancing our ability to accurately study TEs in health and disease in various species.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 1","pages":"e1013933"},"PeriodicalIF":3.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146093892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rational utilization of multimodal spatial transcriptomics (ST) data enables accurate identification of spatial domains, which is essential for investigating cellular structure and functions. In this study, we proposed SpaConTDS, a novel framework that integrates reinforcement learning with self-supervised multimodal contrastive learning. SpaConTDS generates positive and negative samples through data augmentation and a pseudo-label tuple perturbation strategy, enabling the learning of fused representations that capture global semantics and cross-modal interactions. The model's hyper-parameters are dynamically optimized using reinforcement learning. Extensive experiments across various resolutions and platforms demonstrate that SpaConTDS achieves state-of-the-art accuracy in spatial domain identification and outperforms existing methods in downstream tasks such as denoising, trajectory inference, and UMAP visualization. Moreover, SpaConTDS effectively integrates multiple tissue sections and corrects batch effects without requiring prior alignment. Compared to existing approaches, SpaConTDS offers more robust fused representations of multimodal data, providing researchers with a flexible and powerful tool for a wide range of spatial transcriptomics analyses.
{"title":"SpaConTDS: A multimodal contrastive learning framework for identifying spatial domains by applying tuple disturbing strategy.","authors":"Ruiwen Xu, Xiaoqing Cheng, Waiki Ching, Siyao Wu, Yuanben Zhang, Yidan Zhang","doi":"10.1371/journal.pcbi.1013893","DOIUrl":"10.1371/journal.pcbi.1013893","url":null,"abstract":"<p><p>The rational utilization of multimodal spatial transcriptomics (ST) data enables accurate identification of spatial domains, which is essential for investigating cellular structure and functions. In this study, we proposed SpaConTDS, a novel framework that integrates reinforcement learning with self-supervised multimodal contrastive learning. SpaConTDS generates positive and negative samples through data augmentation and a pseudo-label tuple perturbation strategy, enabling the learning of fused representations that capture global semantics and cross-modal interactions. The model's hyper-parameters are dynamically optimized using reinforcement learning. Extensive experiments across various resolutions and platforms demonstrate that SpaConTDS achieves state-of-the-art accuracy in spatial domain identification and outperforms existing methods in downstream tasks such as denoising, trajectory inference, and UMAP visualization. Moreover, SpaConTDS effectively integrates multiple tissue sections and corrects batch effects without requiring prior alignment. Compared to existing approaches, SpaConTDS offers more robust fused representations of multimodal data, providing researchers with a flexible and powerful tool for a wide range of spatial transcriptomics analyses.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 1","pages":"e1013893"},"PeriodicalIF":3.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12854462/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quantifying cell morphology is central to understanding cellular regulation, fate, and heterogeneity, yet conventional image-based analyses often struggle with diverse or irregular shapes. We present a computational framework that uses topological data analysis to characterise and compare single-cell morphologies from fluorescence microscopy. Each cell is represented by its contour together with the position of its nucleus, from which we construct a filtration based on a radial distance function and derive a persistence diagram encoding the shape's topological evolution. The similarity between two cells is quantified using the 2-Wasserstein distance between their diagrams, yielding a shape distance we call the PH distance. We apply this method to two representative experimental systems-primary human mesenchymal stem cells (hMSCs) and HeLa cells-and show that PH distances enable the detection of outliers in those systems, the identification of sub-populations, and the quantification of shape heterogeneity. We benchmark PH against three established contour-based distances (aspect ratio, Fourier descriptors, and elastic shape analysis) and show that PH offers better separation between cell types and greater robustness when clustering heterogeneous populations. Together, these results demonstrate that persistent-homology-based signatures provide a principled and sensitive approach for analysing cell morphology in settings where traditional geometric or image-based descriptors are insufficient.
{"title":"Persistence diagrams as morphological signatures of cells: A method to measure and compare cells within a population.","authors":"Yossi Bokor Bleile, Pooja Yadav, Patrice Koehl, Florian Rehfeldt","doi":"10.1371/journal.pcbi.1013890","DOIUrl":"10.1371/journal.pcbi.1013890","url":null,"abstract":"<p><p>Quantifying cell morphology is central to understanding cellular regulation, fate, and heterogeneity, yet conventional image-based analyses often struggle with diverse or irregular shapes. We present a computational framework that uses topological data analysis to characterise and compare single-cell morphologies from fluorescence microscopy. Each cell is represented by its contour together with the position of its nucleus, from which we construct a filtration based on a radial distance function and derive a persistence diagram encoding the shape's topological evolution. The similarity between two cells is quantified using the 2-Wasserstein distance between their diagrams, yielding a shape distance we call the PH distance. We apply this method to two representative experimental systems-primary human mesenchymal stem cells (hMSCs) and HeLa cells-and show that PH distances enable the detection of outliers in those systems, the identification of sub-populations, and the quantification of shape heterogeneity. We benchmark PH against three established contour-based distances (aspect ratio, Fourier descriptors, and elastic shape analysis) and show that PH offers better separation between cell types and greater robustness when clustering heterogeneous populations. Together, these results demonstrate that persistent-homology-based signatures provide a principled and sensitive approach for analysing cell morphology in settings where traditional geometric or image-based descriptors are insufficient.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 1","pages":"e1013890"},"PeriodicalIF":3.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12871990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146113939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27eCollection Date: 2026-01-01DOI: 10.1371/journal.pcbi.1013904
Ryota Masuki, Donn Liew, Ee Hou Yong
Predicting RNA structures containing pseudoknots remains computationally challenging due to high processing costs and complexity. While standard methods for pseudoknot prediction require O(N6) time complexity, we present a hierarchical approach that significantly reduces computational cost while maintaining prediction accuracy. Our method analyzes RNA structures by dividing them into contiguous regions of unpaired bases ("sections") derived from known secondary structures. We examine pseudoknot interactions between sections using a nearest-neighbor energy model with dynamic programming. Our algorithm scales as [Formula: see text], offering substantial computational advantages over existing global prediction methods. Analysis of 726 transfer messenger RNA and 454 Ribonuclease P RNA sequences reveals that biologically relevant pseudoknots are highly concentrated among section pairs with large minimum free energy (MFE) gain. Over 90% of connected section pairs appear within just the top 3% of section pairs ranked by MFE gain. For 2-clusters, our method achieves high prediction accuracy with sensitivity exceeding 0.9 and positive predictive value above 0.8. For 3-clusters, we discovered asymmetric behavior where "former" section pairs (formed early in the sequence) are predicted accurately, while "latter" section pairs do not follow local energy predictions. This asymmetry suggests that complex pseudoknot formation follows sequential co-transcriptional folding rather than global energy minimization, providing insights into RNA folding dynamics.
{"title":"Hierarchical analysis of RNA secondary structures with pseudoknots based on sections.","authors":"Ryota Masuki, Donn Liew, Ee Hou Yong","doi":"10.1371/journal.pcbi.1013904","DOIUrl":"10.1371/journal.pcbi.1013904","url":null,"abstract":"<p><p>Predicting RNA structures containing pseudoknots remains computationally challenging due to high processing costs and complexity. While standard methods for pseudoknot prediction require O(N6) time complexity, we present a hierarchical approach that significantly reduces computational cost while maintaining prediction accuracy. Our method analyzes RNA structures by dividing them into contiguous regions of unpaired bases (\"sections\") derived from known secondary structures. We examine pseudoknot interactions between sections using a nearest-neighbor energy model with dynamic programming. Our algorithm scales as [Formula: see text], offering substantial computational advantages over existing global prediction methods. Analysis of 726 transfer messenger RNA and 454 Ribonuclease P RNA sequences reveals that biologically relevant pseudoknots are highly concentrated among section pairs with large minimum free energy (MFE) gain. Over 90% of connected section pairs appear within just the top 3% of section pairs ranked by MFE gain. For 2-clusters, our method achieves high prediction accuracy with sensitivity exceeding 0.9 and positive predictive value above 0.8. For 3-clusters, we discovered asymmetric behavior where \"former\" section pairs (formed early in the sequence) are predicted accurately, while \"latter\" section pairs do not follow local energy predictions. This asymmetry suggests that complex pseudoknot formation follows sequential co-transcriptional folding rather than global energy minimization, providing insights into RNA folding dynamics.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"22 1","pages":"e1013904"},"PeriodicalIF":3.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12858078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146066246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}