Genomic surveillance of pathogen evolution is essential for public health response, treatment strategies, and vaccine development. In the context of SARS-COV-2, multiple models have been developed including Multinomial Logistic Regression (MLR) describing variant frequency growth as well as Fixed Growth Advantage (FGA), Growth Advantage Random Walk (GARW) and Piantham parameterizations describing variant Rt. These models provide estimates of variant fitness and can be used to forecast changes in variant frequency. We introduce a framework for evaluating real-time forecasts of variant frequencies, and apply this framework to the evolution of SARS-CoV-2 during 2022 in which multiple new viral variants emerged and rapidly spread through the population. We compare models across representative countries with different intensities of genomic surveillance. Retrospective assessment of model accuracy highlights that most models of variant frequency perform well and are able to produce reasonable forecasts. We find that the simple MLR model provides ∼0.6% median absolute error and ∼6% mean absolute error when forecasting 30 days out for countries with robust genomic surveillance. We investigate impacts of sequence quantity and quality across countries on forecast accuracy and conduct systematic downsampling to identify that 1000 sequences per week is fully sufficient for accurate short-term forecasts. We conclude that fitness models represent a useful prognostic tool for short-term evolutionary forecasting.
Effective analysis of single-cell RNA sequencing (scRNA-seq) data requires a rigorous distinction between technical noise and biological variation. In this work, we propose a simple feature selection model, termed "Differentially Distributed Genes" or DDGs, where a binomial sampling process for each mRNA species produces a null model of technical variation. Using scRNA-seq data where cell identities have been established a priori, we find that the DDG model of biological variation outperforms existing methods. We demonstrate that DDGs distinguish a validated set of real biologically varying genes, minimize neighborhood distortion, and enable accurate partitioning of cells into their established cell-type groups.
Reverse epidemiology is a mathematical modelling tool used to ascertain information about the source of a pathogen, given the spatial and temporal distribution of cases, hospitalisations and deaths. In the context of a deliberately released pathogen, such as Bacillus anthracis (the disease-causing organism of anthrax), this can allow responders to quickly identify the location and timing of the release, as well as other factors such as the strength of the release, and the realized wind speed and direction at release. These estimates can then be used to parameterise a predictive mechanistic model, allowing for estimation of the potential scale of the release, and to optimise the distribution of prophylaxis. In this paper we present two novel approaches to reverse epidemiology, and demonstrate their utility in responding to a simulated deliberate release of B. anthracis in ten locations in the UK and compare these to the standard grid-search approach. The two methods-a modified MCMC and a Recurrent Convolutional Neural Network-are able to identify the source location and timing of the release with significantly better accuracy compared to the grid-search approach. Further, the neural network method is able to do inference on new data significantly quicker than either the grid-search or novel MCMC methods, allowing for rapid deployment in time-sensitive outbreaks.
Learning to read places a strong challenge on the visual system. Years of expertise lead to a remarkable capacity to separate similar letters and encode their relative positions, thus distinguishing words such as FORM and FROM, invariantly over a large range of positions, sizes and fonts. How neural circuits achieve invariant word recognition remains unknown. Here, we address this issue by recycling deep neural network models initially trained for image recognition. We retrain them to recognize written words and then analyze how reading-specialized units emerge and operate across the successive layers. With literacy, a small subset of units becomes specialized for word recognition in the learned script, similar to the visual word form area (VWFA) in the human brain. We show that these units are sensitive to specific letter identities and their ordinal position from the left or the right of a word. The transition from retinotopic to ordinal position coding is achieved by a hierarchy of "space bigram" unit that detect the position of a letter relative to a blank space and that pool across low- and high-frequency-sensitive units from early layers of the network. The proposed scheme provides a plausible neural code for written words in the VWFA, and leads to predictions for reading behavior, error patterns, and the neurophysiology of reading.
In nature, most microbial populations have complex spatial structures that can affect their evolution. Evolutionary graph theory predicts that some spatial structures modelled by placing individuals on the nodes of a graph affect the probability that a mutant will fix. Evolution experiments are beginning to explicitly address the impact of graph structures on mutant fixation. However, the assumptions of evolutionary graph theory differ from the conditions of modern evolution experiments, making the comparison between theory and experiment challenging. Here, we aim to bridge this gap by using our new model of spatially structured populations. This model considers connected subpopulations that lie on the nodes of a graph, and allows asymmetric migrations. It can handle large populations, and explicitly models serial passage events with migrations, thus closely mimicking experimental conditions. We analyze recent experiments in light of this model. We suggest useful parameter regimes for future experiments, and we make quantitative predictions for these experiments. In particular, we propose experiments to directly test our recent prediction that the star graph with asymmetric migrations suppresses natural selection and can accelerate mutant fixation or extinction, compared to a well-mixed population.
Surveillance systems that monitor pathogen genome sequences are critical for rapidly detecting the introduction and emergence of pathogen variants. To evaluate how interactions between surveillance capacity, variant properties, and the epidemiological context influence the timeliness of pathogen variant detection, we developed a geographically explicit stochastic compartmental model to simulate the transmission of a novel SARS-CoV-2 variant in New York City. We measured the impact of (1) testing and sequencing volume, (2) geographic targeting of testing, (3) the timing and location of variant emergence, and (4) the relative variant transmissibility on detection speed and on the undetected disease burden. Improvements in detection times and reduction of undetected infections were driven primarily by increases in the number of sequenced samples. The relative transmissibility of the new variant and the epidemic context of variant emergence also influenced detection times, showing that individual surveillance strategies can result in a wide range of detection outcomes, depending on the underlying dynamics of the circulating variants. These findings help contextualize the design, interpretation, and trade-offs of genomic surveillance strategies of pandemic respiratory pathogens.