Molecule generation is advancing rapidly in chemical discovery and drug design. Flow-matching methods have recently set the state of the art (SOTA) in unconditional molecule generation, surpassing score-based diffusion models. However, diffusion models still lead in property-guided generation. In this work, we introduce PropMolFlow, an approach for property-guided molecule generation based on geometry-complete SE(3)-equivariant flow matching. Integrating five different property embedding methods with a Gaussian expansion of scalar properties, PropMolFlow achieves competitive performance against previous SOTA diffusion models in conditional molecule generation while maintaining high structural stability and validity. Additionally, it enables higher sampling speed with fewer time steps compared with baseline models. We highlight the importance of validating the properties of generated molecules through density functional theory calculations. Furthermore, we introduce a task to assess the model's ability to propose molecules with under-represented property values, assessing its capacity for out-of-distribution generalization.
Spatial epigenomics (SE) technologies profile epigenomic landscapes within intact tissues, preserving spatial context and enabling the study of gene regulatory mechanisms in situ. However, current SE datasets typically suffer from low signal detection, substantial noise and extremely sparse peak matrices, which pose considerable challenges for downstream analysis. Here we introduce SPEED (spatial epigenomic data denoising), a deep matrix factorization framework that leverages atlas-level single-cell epigenomic data and spatial context to impute and denoise SE data. In comprehensive benchmarks on both simulated data and real SE tissue datasets, SPEED outperformed five state-of-the-art methods across diverse tissues and technologies. Moreover, SPEED's denoised outputs facilitated downstream analyses such as differential chromatin accessibility analysis, epigenomic spatial domain identification and gene activity inference. Collectively, our results indicate that SPEED is a generalizable tool for improving data quality and biological insights in SE.
Conventional epigenetic clocks encounter challenges in generalizability, especially when there are pronounced batch effects between the training and test datasets, restricting their clinical applicability for aging assessment. Here we present MAPLE, a robust computational framework for methylation age and disease-risk prediction through pairwise learning. MAPLE utilizes pairwise learning to discern the relative relationships between two DNA methylation profiles regarding age or disease risk. It effectively identifies aging- or disease-related biological signals while mitigating technical biases in the data. MAPLE outperforms five competing methods, achieving a median absolute error of 1.6 years across 31 benchmark tests from diverse studies, sequencing platforms, data preprocessing methods and tissue types. Furthermore, MAPLE performs well when assessing aging-related disease risk, with mean areas under the curve of 0.97 for disease identification and 0.85 for pre-disease status detection. Overall, we show that MAPLE has great potential for assessing epigenetic age and aging-related disease risk clinically.
Spatial transcriptomics has transformed the mapping of gene expression within intact tissues, yet current sequencing-based platforms are limited by coarse spot-level resolution and sparse sampling that leaves large interspot regions unmeasured. Here we introduce PanoSpace, a computational framework that integrates low-resolution spatial transcriptomics with high-resolution histology and matched single-cell RNA sequencing to reconstruct a continuous, single-cell-level map across entire tissue sections. Originally developed for tumors, PanoSpace accurately reconstructs cellular locations, cell identities and gene expression profiles, enabling detailed characterization of intracell-type heterogeneity and spatially organized cell-cell interactions. Application to breast and prostate cancers reveals complex cellular architectures and tumor microenvironment dynamics mediated by cancer-associated fibroblasts. Thanks to its modular design, PanoSpace can be seamlessly adapted to noncancerous tissues, as demonstrated by precise spatial reconstruction in mouse brain. Together, these results demonstrate that PanoSpace enables comprehensive spatial transcriptomic analysis and facilitates biological discovery.
In multicellular systems, cell fate determination emerges from the integration of intracellular signaling and intercellular communication. Spatial transcriptomics (ST) provides opportunities to elucidate these regulatory processes, yet inferring the spatiotemporal dynamics of cell state transitions (CSTs) governed by cell-cell communication (CCC) remains a challenge. Here we introduce CCCvelo to reconstruct CCC-driven CST dynamics by jointly optimizing a dynamic CCC signaling network and a latent CST clock. CCCvelo formulates a unified multiscale nonlinear kinetic model that integrates intercellular ligand-receptor signaling gradients with intracellular transcription-factor activation cascades to capture gene expression dynamics encoding CSTs. Moreover, we devise PINN-CELL, a physics-informed neural-network-based coevolution learning algorithm, which simultaneously optimizes model parameters and pseudotemporal ordering. Application of CCCvelo to high-resolution ST datasets, including mouse cortex, embryonic trunk development and human prostate cancer datasets, demonstrates its ability to successfully recover known morphogenetic trajectories while uncovering dynamic CCC signaling rewiring that orchestrates CST progression.
Here we introduce a framework for molecular structure optimization using a denoising model on a physics-informed Riemannian manifold (R-DM). Unlike conventional approaches operating in Euclidean space, our method leverages a Riemannian metric that better aligns with molecular energy change, enabling more robust modeling of potential energy surfaces. By incorporating internal coordinates reflective of energetic properties, R-DM achieves chemical accuracy with an energy error below 1 kcal mol-1. Comparative evaluations on QM9, QM7-X and GEOM datasets demonstrate improvements in both structural and energetic accuracy, surpassing conventional Euclidean-based denoising models. This approach highlights the potential of physics-informed coordinates for tackling complex molecular optimization problems, with implications for tasks in computational chemistry and materials science.
Prediction models that generate neuronal spikes from upstream neural activities offer a promising way to re-establish neural functional connectivity. Traditional methods train these models by supervised learning, which requires downstream recordings as ground truth. However, functional downstream activity cannot be recorded when neurological disorders exist. Here we introduce a reinforcement learning (RL)-based point process framework to generate spike trains that directly maximize behavior-level rewards, thus bypassing downstream recordings. This yields a generative spike model that directly transforms upstream activity into spike patterns modulated to desired behavior. We show that these RL-based generative models produce movement-modulated spike patterns akin to downstream recordings from healthy subjects, providing a biomimetic spike encoding framework. This RL framework outperforms existing methods and demonstrates a strong adaptation capability across different decoder settings, highlighting its potential for neural prostheses in restoring transregional communication with biomimetic cortical stimulation.

