Visceral leishmaniasis caused by Leishmania infantum remains a lethal disease with few therapeutic options, necessitating innovative computational methods and approaches to accelerate drug discovery. Here, we present a graph neural network (GNN) framework incorporating well-established multiscale mechanisms to improve the identification of novel antileishmanial compounds. Across two classificatory antileishmanial data sets, our GNNs demonstrated significant improvements in predictive performance, with area under the receiver operating characteristic curve (AUC) increases of 2.2-29.2% on the imbalanced data set (activity cutoff: 1 μM) and 3.4-22.5% on the balanced data set (activity cutoff: 10 μM) compared to default GNNs. Subsequently, the framework was applied to screen a library of approximately 1.3 million compounds, pinpointing LC-61 as a potent antileishmanial agent with nanomolar activity against intracellular L. infantum (IC50 = 0.076 μM) and minimal cytotoxicity to macrophages (THP-1 CC50 = 157 μM). A comprehensive in vitro ADME profiling revealed that LC-61 combines high solubility at both acidic and physiological pH (>28 μg/mL), balanced lipophilicity (eLogD = 4.07), and favorable passive permeability (PAMPA = 4.86 × 10-6 cm/s), while exhibiting lower microsomal stability. Overall, our GNN framework effectively accelerated the discovery of LC-61, a novel and biologically validated hit suitable for hit-to-lead optimization.
The rapid expansion of enzyme reaction literature has created a major bottleneck in database curation, leaving vast amounts of enzyme-substrate-condition relationships unstructured and inaccessible for DL-driven modeling. How to fully utilize the enzymatic reaction data has been an important task for future accurate enzyme activity prediction models. Current deep learning (DL)-based data extraction models heavily rely on large language models (LLMs) without a fidelity check and the ability to continuously evolve. To address these issues, we developed zERExtractor (Zelixir's Enzyme Reaction Data Extractor), an accuracy-oriented and extensible platform for extracting enzyme-catalyzed reaction data from scientific publications. This system offers a unified multimodal information extraction framework (covering molecular reaction diagrams, tables, and texts) to integrate enzymatic reaction descriptors into structured storage. We employ fine-tuned large LLMs together with DL in a human-in-the-loop pipeline that evolves through data fidelity validation by experts and active learning. Also, zERExtractor achieves 89.9% accuracy in table recognition and over 98% accuracy in molecular image recognition on synthetic data sets, outperforming the strongest baseline by more than 2% and consistently maintaining above 95% on realistic benchmarks. zERExtractor bridges the data gap in enzyme reaction data with a scalable framework for accurate multimodal extraction, advancing DL-driven enzyme modeling and enabling future applications in computational enzymology and biotechnology. The platform is publicly accessible online at https://zpaper.zelixir.com/.
The extensive use of pesticides and synthetic dyes poses critical threats to food safety, human health, and environmental sustainability, necessitating rapid and reliable detection methods. Raman spectroscopy offers molecularly specific fingerprints but suffers from spectral noise, fluorescence background, and band overlap, limiting its real-world applicability. Here, we propose a deep learning framework based on ResNet-18 feature extraction, combined with advanced classifiers, including XGBoost, SVM, and their hybrid integration, to detect pesticides and dyes from Raman spectroscopy, called MLRaman. The MLRaman with the CNN-XGBoost model achieved a predictive accuracy of 97.4% and a perfect AUC of 1.0, while it with the CNN-SVM model provided competitive results with robust class-wise discrimination. Dimensionality reduction analyzes (PCA, t-SNE, UMAP) confirmed the separability of Raman embeddings across 10 analytes, including 7 pesticides and 3 dyes. Finally, we developed a user-friendly Streamlit application for real-time prediction, which successfully identified unseen Raman spectra from our independent experiments and also literature sources, underscoring strong generalization capacity. This study establishes a scalable, practical MLRaman model for multiresidue contaminant monitoring, with significant potential for deployment in food safety and environmental surveillance.

