Bitterness, alongside sour, sweet, umami, and salty tastes, constitutes one of the five basic tastes and serves as a key dimension in shaping food flavor profiles. Food protein processing readily generates bitter peptides, whose intense bitterness often leads to consumer rejection, yet these peptides frequently carry beneficial bioactivities, necessitating a trade-off between flavor and functionality. This necessitates the quantitative assessment of bitterness intensity in the early stages of product development. However, experimental assays relying on sensory evaluation and electronic tongue instruments are complex, costly, and limited in throughput, constraining the systematic identification of bitter peptides and process optimization. Here, we present BIPE (Bitterness Intensity Prediction Engine), an end-to-end regression model that integrates ESM3 protein language model representations with a multilayer perceptron readout, performing regression of bitterness thresholds in log space to directly assess bitterness intensity from sequence alone. BIPE achieves R2 = 0.9050 under 10-fold cross-validation and R2 = 0.9449 on an independent test set. BIPE accurately reproduces trends in both electronic tongue readouts and human sensory scores, demonstrating a consistent external validity across assays. Besides, BIPE accurately differentiates the bitterness intensities of soybean protein hydrolysates produced by multiple commercial proteases. Finally, systematic scanning of the complete pentapeptide sequence space by BIPE further reveals amino acid compositional patterns associated with bitterness, providing mechanistic insights. By advancing from classification to quantitative regression, BIPE enables rational design of low-bitterness peptides, supports flavor engineering and process optimization, and establishes a reusable baseline for taste modeling.
Selector is a free, open-source Python library for selecting diverse subsets from any dataset, making it a versatile tool across a wide range of application domains. Selector implements different subset sampling algorithms based on sample distance, similarity, and spatial partitioning along with metrics to quantify subset diversity. It is flexible and integrates seamlessly with popular Python libraries such as Scikit-Learn, demonstrating the interoperability of the implemented algorithms with data analysis workflows. Selector is an operating-system-agnostic, accessible, and easily extensible package designed with modern software development practices, including version control, unit testing, and continuous integration. Interactive quick-start notebooks, which are also web-accessible, provide user-friendly tutorials for all skill levels, showcasing applications in computational chemistry, drug discovery, and chemical library design. Additionally, a web interface has been developed that allows users to easily upload datasets, configure sampling settings, and run subset selection algorithms with no programming required. This work serves as the official release note for the Selector package, offering a technical overview of its features, use cases, and development practices that ensure its quality and maintainability.
Fully dynamic sampling of host-guest inclusion remains difficult because conventional docking and conventional molecular dynamics simulations can sample inclusion, but crystal-like binding is typically stochastic and difficult to reproduce. Here, we introduce DFDD (Distance-Guided Fully Dynamic Docking), a cloud-ready implementation of the LB-PaCS-MD framework designed to capture inclusion processes via unbiased molecular dynamics in explicit solvent. DFDD automates system setup, parameter generation, iterative short-cycle MD sampling, and trajectory analysis within a single workflow that runs on Google Colab without any installation. Progress toward complexation is guided only by the host-guest center-of-mass distance, allowing force-free exploration of insertion pathways and enabling the recovery of both stable and transient binding modes. Using β-cyclodextrin as a representative host, DFDD reproduces experimentally observed inclusion geometries within minutes and reveals intermediate states along the insertion route. Optional coupling with pKaNET-Cloud enables pH-aware, stereochemically consistent ligand protonation states prior to simulation, supporting robust host-guest modeling. This Application Note provides a transparent and accessible platform for efficient host-guest complexation studies. The DFDD framework is publicly available at https://github.com/nyelidl/DFDD.
Kinases are pivotal cell signaling regulators and prominent drug targets. Short peptide substrates are widely used in kinase activity assays essential for investigating kinase biology and drug discovery. However, designing substrates with high activity and specificity remains challenging. Here, we present Subtimizer (substrate optimizer), a streamlined computational pipeline for structure-guided kinase peptide substrate design using AlphaFold-Multimer for structure modeling, ProteinMPNN for sequence design, and AlphaFold2-based interface evaluation. Applied to five kinases, four showed substantially improved activity (up to 350%) with designed peptides. Kinetic analyses revealed >2-fold reductions in the Michaelis constant (Km), indicating improved enzyme-substrate affinity. Designed peptides for MET and ROS1 exhibited reciprocal selectivity, with 4-fold and 11-fold preferences for their intended targets, respectively. This study demonstrates AI-driven structure-guided protein design as an effective approach for developing potent and selective kinase substrates, facilitating assay development for drug discovery and functional investigation of the kinome.
Discovering novel molecules within the vast chemical space is a central scientific challenge, increasingly delegated to deep generative models. However, the prevailing "black box" paradigm, built on continuous latent spaces, faces a fundamental mismatch between smooth optimization landscapes and inherently discrete molecular structures, often limiting global exploration. To overcome these limitations, we introduce Janus, a framework that recasts molecular design as a transparent, physics-inspired combinatorial optimization problem. At its core, Janus employs a Transformer-based autoencoder with a regularized binary bottleneck to map molecules into a compact discrete latent space. This representation enables the reformulation of molecular generation and optimization as a Quadratic Unconstrained Binary Optimization (QUBO) problem. This approach unlocks synergistic capabilities. For molecular generation, Janus leverages classical and quantum annealers to efficiently traverse the structured energy landscape, enabling the global discovery of diverse chemical scaffolds. Crucially, for molecular optimization, it moves beyond blind search by utilizing quantifiable feature interactions as machine-discovered SAR rules. This allows for rational, interpretable optimization─selectively modifying latent bits to enhance properties. Benchmarking against state-of-the-art methods reveals that this approach achieves superior multiobjective performance while preserving scaffold integrity, avoiding the structural fragmentation common in heuristic baselines. We validate the feasibility of the workflow on a quantum annealer and demonstrate its efficacy in drug-like property optimization. By unifying powerful combinatorial exploration with deep model interpretability, Janus establishes a robust framework for rational, quantum-assisted molecular design.
The rapid proliferation of AI/ML models in drug discovery heralds an era of extraordinary progress but also raises urgent questions about whether the true predictive performance is as good as advertised. On-target prediction models often benefit from high-resolution structural or atomistic representations that capture the subtleties of binding affinity and pose. In contrast, off-target and ADMET liabilities have typically relied on more implicit representations of molecular interactions. Retrospective benchmarks often provide a misleading picture of how successful these diverse representations are at predicting properties, and the community lacks standardized, prospective comparisons. Blind challenges, such as the OpenADMET × ASAP × PolarisHub Challenge featured in this issue, are crucial for realistically evaluating progress, encouraging iterations, and directing collective efforts toward major accuracy barriers. With ongoing investment in large-scale, open data creation, and community-led challenges, predictive modeling is poised to rapidly transform drug discovery by enabling accurate, multiparameter optimization.
We validate semiempirical sTDA-xTB and sTD-DFT-xTB methods for high-throughput screening of thermally activated delayed fluorescence (TADF) emitters using 747 experimentally characterized molecules─the largest such benchmark to date. Our framework achieves >99% computational cost reduction versus TD-DFT while maintaining strong internal consistency (Pearson r ≈ 0.82) and reasonable agreement with 312 experimental singlet-triplet gaps (MAE ≈ 0.17 eV). Large-scale analysis statistically validates key design principles: D-A-D architectures outperform other motifs, and optimal torsional angles of 50°-90° maximize TADF efficiency, while PCA confirms a low-dimensional property space. This work establishes xTB methods as cost-effective tools for accelerating TADF discovery.
Closely related membrane transporters can diverge sharply in their modes of transport despite minimal sequence differences, underscoring how minor structural features can alter the transport function. This divergence is exemplified in nitrate and nitrite transport across bacterial membranes, which supports anaerobic respiration and involves the major facilitator superfamily (MFS) transporters NarK and NarU. NarK operates as a nitrate/nitrite antiporter, whereas NarU's mechanism remains unresolved, with evidence suggesting potential symport activity. Using extensive adaptive molecular dynamics simulations and Markov State Modeling, we mapped NarU's conformational free-energy landscape and assessed how its behavior contrasts with mechanistic principles established for NarK. NarU follows a similar gating pathway but displays pronounced asymmetry favoring the outward-facing state and stabilizes an apo-occluded intermediate inaccessible to antiporters. This state arises from rotation of an arginine gating pair and a hinged glycine substitution that enhances gate flexibility. These sequence-dependent adaptations alter gating energetics and reprogram the scaffold to permit coupled cotransport. Our results show that the presence of a few strategic residue substitutions in the binding pocket and translocation pathway could alter the transport mechanism of transporters with high sequence and structural similarity.
Accurate and rapid prediction of protein-ligand binding affinities is critical for drug discovery, particularly when evaluating large chemical libraries or new drug molecules from high-throughput generative models. We present UCBbind, a hybrid framework that combines a similarity-based transfer module with a deep-learning-based prediction module, to efficiently estimate binding affinities of small molecules to target proteins. For each query protein-ligand pair, UCBbind transfers experimental data from highly similar reference pairs when available and applies the prediction module when no sufficiently similar reference exists. We benchmarked UCBbind on multiple datasets, including the CASF-2016 set, the HiQBind dataset post 2020, and the COVID Moonshot database. Our results show that UCBbind achieves state-of-the-art predictive performance, particularly for test entries with high similarity to well-characterized reference proteins and ligands, and can support downstream tasks such as binding site prediction and binder/nonbinder classification.

