Jianming Mao, Yongkang Xi, Armin Shayesteh Zadeh, Allen P. Liu and Andrew L. Ferguson
Synthetic cells are prevalent models for understanding and recapitulating complicated functions of natural cells such as DNA replication and protein expression. Lipid-based vesicles are widely employed but are limited by their fragility under mechanical forces or osmotic pressure. Elastin-like polypeptides (ELPs) composed of repetitive (VPGXG) sequences present alternative building blocks with which to construct the delimiting membrane of synthetic cells possessing high structural stability and tolerance of harsh environmental stress. In this work, we present a high-throughput virtual screening pipeline combining coarse-grained simulations, alchemical free energy calculations, Gaussian process regression, and Bayesian optimization to traverse a library of amphiphilic diblock ELPs for mutant sequences predicted to form thermodynamically stable bilayer vesicles. From our screening campaign, we have identified a range of novel ELP candidates with enhanced predicted stability. Analysis of our screening data exposes new rational design principles that suggest incorporating particular guest residues in hydrophilic blocks – including histidine, tyrosine, and threonine – and in hydrophobic blocks – including alanine, phenylalanine, cysteine, and isoleucine – to enhance the thermodynamic stability of ELP bilayer vesicles. The computational pipeline greatly accelerates the discovery of ELP building blocks for synthetic cells, exposes new design principles for these molecules, and furnishes a transferable framework for designing peptides with desirable structural or functional properties.
{"title":"Computational design of polypeptide-based compartments for synthetic cells","authors":"Jianming Mao, Yongkang Xi, Armin Shayesteh Zadeh, Allen P. Liu and Andrew L. Ferguson","doi":"10.1039/D5DD00291E","DOIUrl":"https://doi.org/10.1039/D5DD00291E","url":null,"abstract":"<p >Synthetic cells are prevalent models for understanding and recapitulating complicated functions of natural cells such as DNA replication and protein expression. Lipid-based vesicles are widely employed but are limited by their fragility under mechanical forces or osmotic pressure. Elastin-like polypeptides (ELPs) composed of repetitive (VPGXG) sequences present alternative building blocks with which to construct the delimiting membrane of synthetic cells possessing high structural stability and tolerance of harsh environmental stress. In this work, we present a high-throughput virtual screening pipeline combining coarse-grained simulations, alchemical free energy calculations, Gaussian process regression, and Bayesian optimization to traverse a library of amphiphilic diblock ELPs for mutant sequences predicted to form thermodynamically stable bilayer vesicles. From our screening campaign, we have identified a range of novel ELP candidates with enhanced predicted stability. Analysis of our screening data exposes new rational design principles that suggest incorporating particular guest residues in hydrophilic blocks – including histidine, tyrosine, and threonine – and in hydrophobic blocks – including alanine, phenylalanine, cysteine, and isoleucine – to enhance the thermodynamic stability of ELP bilayer vesicles. The computational pipeline greatly accelerates the discovery of ELP building blocks for synthetic cells, exposes new design principles for these molecules, and furnishes a transferable framework for designing peptides with desirable structural or functional properties.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 214-230"},"PeriodicalIF":6.2,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00291e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pau Rocabert-Oriols, Camilla Lo Conte, Núria López and Javier Heras-Domingo
Identifying molecular structures from vibrational spectra is central to chemical analysis but remains challenging due to spectral ambiguity and the limitations of single-modality methods. While deep learning has advanced various spectroscopic characterization techniques, leveraging the complementary nature of infrared (IR) and Raman spectroscopies remains largely underexplored. We introduce VibraCLIP, a contrastive learning framework that embeds molecular graphs, IR and Raman spectra into a shared latent space. A lightweight fine-tuning protocol ensures generalization from theoretical to experimental datasets. VibraCLIP enables accurate, scalable, and data-efficient molecular identification, linking vibrational spectroscopy with structural interpretation. This tri-modal design captures rich structure–spectra relationships, achieving Top-1 retrieval accuracy of 81.7% and reaching 98.9% Top-25 accuracy with molecular mass integration. By integrating complementary vibrational spectroscopic signals with molecular representations, VibraCLIP provides a practical framework for automated spectral analysis, with potential applications in fields such as synthesis monitoring, drug development, and astrochemical detection.
{"title":"Multi-modal contrastive learning for chemical structure elucidation with VibraCLIP","authors":"Pau Rocabert-Oriols, Camilla Lo Conte, Núria López and Javier Heras-Domingo","doi":"10.1039/D5DD00269A","DOIUrl":"https://doi.org/10.1039/D5DD00269A","url":null,"abstract":"<p >Identifying molecular structures from vibrational spectra is central to chemical analysis but remains challenging due to spectral ambiguity and the limitations of single-modality methods. While deep learning has advanced various spectroscopic characterization techniques, leveraging the complementary nature of infrared (IR) and Raman spectroscopies remains largely underexplored. We introduce VibraCLIP, a contrastive learning framework that embeds molecular graphs, IR and Raman spectra into a shared latent space. A lightweight fine-tuning protocol ensures generalization from theoretical to experimental datasets. VibraCLIP enables accurate, scalable, and data-efficient molecular identification, linking vibrational spectroscopy with structural interpretation. This tri-modal design captures rich structure–spectra relationships, achieving Top-1 retrieval accuracy of 81.7% and reaching 98.9% Top-25 accuracy with molecular mass integration. By integrating complementary vibrational spectroscopic signals with molecular representations, VibraCLIP provides a practical framework for automated spectral analysis, with potential applications in fields such as synthesis monitoring, drug development, and astrochemical detection.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3818-3827"},"PeriodicalIF":6.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00269a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fullerenes, carbon-based nanomaterials with sp2-hybridized carbon atoms arranged in polyhedral cages, exhibit diverse isomeric structures with promising applications in optoelectronics, solar cells, and medicine. However, the vast number of possible fullerene isomers complicates efficient property prediction. In this study, we introduce FullereneNet, a graph neural network-based model that predicts fundamental properties of fullerenes using topological features derived solely from unoptimized structures, eliminating the need for computationally expensive quantum chemistry optimizations. The model leverages topological representations based on the chemical environments of pentagonal and hexagonal rings, enabling efficient capture of local structural details. We show that this approach yields superior performance in predicting the C–C binding energy for a wide range of fullerene sizes, achieving mean absolute errors of 3 meV per atom for C60, 4 meV per atom for C70, and 6 meV per atom for C72–C100, surpassing the values of the state-of-the-art machine learning interatomic potential GAP-20. Additionally, the FullereneNet model accurately predicts 11 other properties, including the HOMO–LUMO gap and solvation free energy, demonstrating robustness and transferability across fullerene types. This work provides a computationally efficient framework for high-throughput screening of fullerene candidates and establishes a foundation for future data-driven studies in fullerene chemistry.
{"title":"Extrapolating beyond C60: advancing prediction of fullerene isomers with FullereneNet","authors":"Bin Liu, Jirui Jin and Mingjie Liu","doi":"10.1039/D5DD00241A","DOIUrl":"https://doi.org/10.1039/D5DD00241A","url":null,"abstract":"<p >Fullerenes, carbon-based nanomaterials with sp<small><sup>2</sup></small>-hybridized carbon atoms arranged in polyhedral cages, exhibit diverse isomeric structures with promising applications in optoelectronics, solar cells, and medicine. However, the vast number of possible fullerene isomers complicates efficient property prediction. In this study, we introduce FullereneNet, a graph neural network-based model that predicts fundamental properties of fullerenes using topological features derived solely from unoptimized structures, eliminating the need for computationally expensive quantum chemistry optimizations. The model leverages topological representations based on the chemical environments of pentagonal and hexagonal rings, enabling efficient capture of local structural details. We show that this approach yields superior performance in predicting the C–C binding energy for a wide range of fullerene sizes, achieving mean absolute errors of 3 meV per atom for C<small><sub>60</sub></small>, 4 meV per atom for C<small><sub>70</sub></small>, and 6 meV per atom for C<small><sub>72</sub></small>–C<small><sub>100</sub></small>, surpassing the values of the state-of-the-art machine learning interatomic potential GAP-20. Additionally, the FullereneNet model accurately predicts 11 other properties, including the HOMO–LUMO gap and solvation free energy, demonstrating robustness and transferability across fullerene types. This work provides a computationally efficient framework for high-throughput screening of fullerene candidates and establishes a foundation for future data-driven studies in fullerene chemistry.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 123-133"},"PeriodicalIF":6.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00241a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shogo Tadokoro, Ryosuke Kamimura, Fumitaka Ishiwari and Akinori Saeki
Improving the performance of organic photovoltaics (OPVs) depends on the development of new p-type polymers and n-type non-fullerene acceptor (NFA) molecules. However, conventional experimental and theoretical methods are inefficient for exploring the vast chemical space. In this report, we use machine learning (ML) to explore simple-structured p-type polymers. The structural simplicity is associated with a small synthesis step relevant for low-cost, large-scale production. By considering the structural simplicity (primitively based on the molecular weight of its repeating unit) of the 200 thousand virtually generated polymers, together with synthetic accessibility, we focus on copolymers composed of benzoxadiazole as an acceptor and thiophene (or phenylene) as a donor. Although the structures of these copolymers resemble a high-performance simple-structured PTQ10, their structural symmetries (regioregularity) are modified for synthetic reasons. Through the characterization of the synthesized polymers, their OPV devices blended with Y6 NFA, and resultant synthetic complexity scores, we show that our polymer with a minor manual modification of the donor and alkyl chain exhibits a power conversion efficiency of 5.56%, which closely aligns with that predicted by ML and provides a basis for the further development of novel polymers with low synthesis and search costs.
{"title":"Design of simple-structured conjugated polymers for organic solar cells by machine learning-assisted structural modification and experimental validation","authors":"Shogo Tadokoro, Ryosuke Kamimura, Fumitaka Ishiwari and Akinori Saeki","doi":"10.1039/D5DD00418G","DOIUrl":"https://doi.org/10.1039/D5DD00418G","url":null,"abstract":"<p >Improving the performance of organic photovoltaics (OPVs) depends on the development of new p-type polymers and n-type non-fullerene acceptor (NFA) molecules. However, conventional experimental and theoretical methods are inefficient for exploring the vast chemical space. In this report, we use machine learning (ML) to explore simple-structured p-type polymers. The structural simplicity is associated with a small synthesis step relevant for low-cost, large-scale production. By considering the structural simplicity (primitively based on the molecular weight of its repeating unit) of the 200 thousand virtually generated polymers, together with synthetic accessibility, we focus on copolymers composed of benzoxadiazole as an acceptor and thiophene (or phenylene) as a donor. Although the structures of these copolymers resemble a high-performance simple-structured PTQ10, their structural symmetries (regioregularity) are modified for synthetic reasons. Through the characterization of the synthesized polymers, their OPV devices blended with Y6 NFA, and resultant synthetic complexity scores, we show that our polymer with a minor manual modification of the donor and alkyl chain exhibits a power conversion efficiency of 5.56%, which closely aligns with that predicted by ML and provides a basis for the further development of novel polymers with low synthesis and search costs.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3774-3781"},"PeriodicalIF":6.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00418g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transforming in situ transmission electron microscopy (TEM) imaging into a tool for spatially-resolved operando characterization of solid-state reactions requires automated, high-precision semantic segmentation of dynamically evolving features. However, traditional deep learning methods for semantic segmentation often face limitations due to the scarcity of labeled data, visually ambiguous features of interest, and scenarios involving small objects. To tackle these challenges, we introduce MultiTaskDeltaNet (MTDN), a novel deep learning architecture that creatively reconceptualizes the segmentation task as a change detection problem. By implementing a unique Siamese network with a U-Net backbone and using paired images to capture feature changes, MTDN effectively leverages minimal data to produce high-quality segmentations. Furthermore, MTDN utilizes a multi-task learning strategy to exploit correlations between physical features of interest. In an evaluation using data from in situ environmental TEM (ETEM) videos of filamentous carbon gasification, MTDN demonstrated a significant advantage over conventional segmentation models, particularly in accurately delineating fine structural features. Notably, MTDN achieved a 10.22% performance improvement over conventional segmentation models in predicting small and visually ambiguous physical features. This work bridges key gaps between deep learning and practical TEM image analysis, advancing automated characterization of nanomaterials in complex experimental settings.
{"title":"MultiTaskDeltaNet: change detection-based image segmentation for operando ETEM with application to carbon gasification kinetics","authors":"Yushuo Niu, Tianyu Li, Yuanyuan Zhu and Qian Yang","doi":"10.1039/D5DD00333D","DOIUrl":"https://doi.org/10.1039/D5DD00333D","url":null,"abstract":"<p >Transforming <em>in situ</em> transmission electron microscopy (TEM) imaging into a tool for spatially-resolved <em>operando</em> characterization of solid-state reactions requires automated, high-precision semantic segmentation of dynamically evolving features. However, traditional deep learning methods for semantic segmentation often face limitations due to the scarcity of labeled data, visually ambiguous features of interest, and scenarios involving small objects. To tackle these challenges, we introduce MultiTaskDeltaNet (MTDN), a novel deep learning architecture that creatively reconceptualizes the segmentation task as a change detection problem. By implementing a unique Siamese network with a U-Net backbone and using paired images to capture feature changes, MTDN effectively leverages minimal data to produce high-quality segmentations. Furthermore, MTDN utilizes a multi-task learning strategy to exploit correlations between physical features of interest. In an evaluation using data from <em>in situ</em> environmental TEM (ETEM) videos of filamentous carbon gasification, MTDN demonstrated a significant advantage over conventional segmentation models, particularly in accurately delineating fine structural features. Notably, MTDN achieved a 10.22% performance improvement over conventional segmentation models in predicting small and visually ambiguous physical features. This work bridges key gaps between deep learning and practical TEM image analysis, advancing automated characterization of nanomaterials in complex experimental settings.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 290-303"},"PeriodicalIF":6.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00333d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas M. Bickley, Angus Mingare, Tim Weaving, Michael Williams de la Bastida, Shunzhou Wan, Martina Nibbi, Philipp Seitz, Alexis Ralli, Peter J. Love, Minh Chung, Mario Hernández Vera, Laura Schulz and Peter V. Coveney
The advent of hybrid computing platforms consisting of quantum processing units integrated with conventional high-performance computing brings new opportunities for algorithm design. By strategically offloading select portions of the workload to classical hardware where tractable, we may broaden the applicability of quantum computation in the near term. In this perspective, we review techniques that facilitate the study of subdomains of chemical systems with quantum computers and present a proof-of-concept demonstration of quantum-selected configuration interaction deployed within a multiscale/multiphysics simulation workflow leveraging classical molecular dynamics, projection-based embedding and qubit subspace tools. This allows the technology to be utilised for simulating systems of real scientific and industrial interest, which not only brings true quantum utility closer to realisation but is also relevant as we look forward to the fault-tolerant regime.
{"title":"Extending quantum computing through subspace, embedding and classical molecular dynamics techniques","authors":"Thomas M. Bickley, Angus Mingare, Tim Weaving, Michael Williams de la Bastida, Shunzhou Wan, Martina Nibbi, Philipp Seitz, Alexis Ralli, Peter J. Love, Minh Chung, Mario Hernández Vera, Laura Schulz and Peter V. Coveney","doi":"10.1039/D5DD00225G","DOIUrl":"https://doi.org/10.1039/D5DD00225G","url":null,"abstract":"<p >The advent of hybrid computing platforms consisting of quantum processing units integrated with conventional high-performance computing brings new opportunities for algorithm design. By strategically offloading select portions of the workload to classical hardware where tractable, we may broaden the applicability of quantum computation in the near term. In this perspective, we review techniques that facilitate the study of subdomains of chemical systems with quantum computers and present a proof-of-concept demonstration of quantum-selected configuration interaction deployed within a multiscale/multiphysics simulation workflow leveraging classical molecular dynamics, projection-based embedding and qubit subspace tools. This allows the technology to be utilised for simulating systems of real scientific and industrial interest, which not only brings true quantum utility closer to realisation but is also relevant as we look forward to the fault-tolerant regime.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3427-3444"},"PeriodicalIF":6.2,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00225g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis H. M. Torres, Sofia M. da Silva, Joel P. Arrais, Catarina Pimentel and Bernardete Ribeiro
The Ames mutagenicity test serves as a cornerstone for evaluating the mutagenic potential of chemical compounds, which is critical in drug discovery and safety assessments. However, existing computational methods struggle to utilize the contribution of individual bacterial strains used in the Ames test, limiting the accuracy of overall mutagenicity predictions. To address this, we introduce Meta-GTMP, a few-shot learning framework that combines graph neural networks (GNNs) and Transformers to integrate the local molecular graph structure with the global information in graph embedding representations for mutagenicity prediction using limited labeled data. A multi-task meta-learning strategy further optimizes the model parameters across individual strain-specific few-shot tasks, leveraging their complementarity to predict the overall Ames result. Computational experiments conducted on the ISSSTY v1-a dataset demonstrate that Meta-GTMP outperforms standard graph-based models, achieving notable improvements in sensitivity (+6.82%) and ROC-AUC score (+2.50%). Laboratory validation tests using six chemically diverse compounds with unknown mutagenicity labels confirmed the model's effectiveness, achieving high accuracy in distinguishing mutagenic and non-mutagenic samples. Importantly, Meta-GTMP makes explainable predictions through a node-edge attribute masking strategy, identifying significant molecular substructures responsible for mutagenicity. These insights are essential in drug discovery, positioning Meta-GTMP as a robust and explainable tool for using mutagenicity predictions to enhance the identification, selection and rational design of safer and more effective potential drug candidates.
{"title":"Advancing mutagenicity predictions in drug discovery with an explainable few-shot deep learning framework","authors":"Luis H. M. Torres, Sofia M. da Silva, Joel P. Arrais, Catarina Pimentel and Bernardete Ribeiro","doi":"10.1039/D5DD00276A","DOIUrl":"https://doi.org/10.1039/D5DD00276A","url":null,"abstract":"<p >The Ames mutagenicity test serves as a cornerstone for evaluating the mutagenic potential of chemical compounds, which is critical in drug discovery and safety assessments. However, existing computational methods struggle to utilize the contribution of individual bacterial strains used in the Ames test, limiting the accuracy of overall mutagenicity predictions. To address this, we introduce Meta-GTMP, a few-shot learning framework that combines graph neural networks (GNNs) and Transformers to integrate the local molecular graph structure with the global information in graph embedding representations for mutagenicity prediction using limited labeled data. A multi-task meta-learning strategy further optimizes the model parameters across individual strain-specific few-shot tasks, leveraging their complementarity to predict the overall Ames result. Computational experiments conducted on the ISSSTY v1-a dataset demonstrate that Meta-GTMP outperforms standard graph-based models, achieving notable improvements in sensitivity (+6.82%) and ROC-AUC score (+2.50%). Laboratory validation tests using six chemically diverse compounds with unknown mutagenicity labels confirmed the model's effectiveness, achieving high accuracy in distinguishing mutagenic and non-mutagenic samples. Importantly, Meta-GTMP makes explainable predictions through a node-edge attribute masking strategy, identifying significant molecular substructures responsible for mutagenicity. These insights are essential in drug discovery, positioning Meta-GTMP as a robust and explainable tool for using mutagenicity predictions to enhance the identification, selection and rational design of safer and more effective potential drug candidates.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3515-3532"},"PeriodicalIF":6.2,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00276a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenxiang Song, Yuyang Zhang, Le Xiong, Xinmin Li, Jingwei Zhang, Guixia Liu, Weihua Li, Youjun Yang and Yun Tang
With the rapid advancement of fluorescent dye research, there is an urgent need for tools capable of accurately predicting dye optical properties while facilitating structural modification. However, the field currently lacks reliable and user-friendly tools for this purpose. To address this gap, we have developed Fluor-tools—an integrated platform for dye property prediction and structural optimization. The platform comprises two core modules: (1) Fluor-pred, a dye property prediction model that integrates domain-specific knowledge of fluorophores with a label distribution smoothing (LDS) reweighting strategy and an advanced residual lightweight attention (RLAT) architecture. This model achieves state-of-the-art performance in predicting four key photophysical properties of dyes. (2) Fluor-opt, a structural optimization module that employs a matched molecular pair analysis (MMPA) method enhanced with symmetry-aware and environment-adaptive modifications. This module derives 1579 structural transformation rules, enabling the directional optimization of non-NIR (non-near-infrared) dyes to NIR properties. In summary, Fluor-tools provides robust computational support for research in biomedical imaging and optical materials. The platform is freely accessible at https://lmmd.ecust.edu.cn/Fluor-tools/.
{"title":"Fluor-tools: an integrated platform for dye property prediction and structure optimization","authors":"Wenxiang Song, Yuyang Zhang, Le Xiong, Xinmin Li, Jingwei Zhang, Guixia Liu, Weihua Li, Youjun Yang and Yun Tang","doi":"10.1039/D5DD00402K","DOIUrl":"https://doi.org/10.1039/D5DD00402K","url":null,"abstract":"<p >With the rapid advancement of fluorescent dye research, there is an urgent need for tools capable of accurately predicting dye optical properties while facilitating structural modification. However, the field currently lacks reliable and user-friendly tools for this purpose. To address this gap, we have developed Fluor-tools—an integrated platform for dye property prediction and structural optimization. The platform comprises two core modules: (1) Fluor-pred, a dye property prediction model that integrates domain-specific knowledge of fluorophores with a label distribution smoothing (LDS) reweighting strategy and an advanced residual lightweight attention (RLAT) architecture. This model achieves state-of-the-art performance in predicting four key photophysical properties of dyes. (2) Fluor-opt, a structural optimization module that employs a matched molecular pair analysis (MMPA) method enhanced with symmetry-aware and environment-adaptive modifications. This module derives 1579 structural transformation rules, enabling the directional optimization of non-NIR (non-near-infrared) dyes to NIR properties. In summary, Fluor-tools provides robust computational support for research in biomedical imaging and optical materials. The platform is freely accessible at https://lmmd.ecust.edu.cn/Fluor-tools/.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3728-3743"},"PeriodicalIF":6.2,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00402k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eunjae Shim, Ambuj Tewari, Paul M. Zimmerman and Tim Cernak
Tailoring a reaction condition to suit new substrates can be labor-intensive. While machine learning can aid this endeavor, conventional strategies require large datasets to make useful predictions. Active transfer learning (ATL) tackles this problem by leveraging previously collected reaction data and adaptively selecting reagent combinations. Here, ATL is prospectively applied to find improved reagent combinations for C(sp3)–C(sp3) cross-couplings between activated amines and carboxylic acids. The formation of carbon–carbon bonds from amines and acids is a powerful complement to the classic amide coupling, but the formation of sterically congested secondary alkyl groups studied here represents a challenge for catalysis. Our results demonstrate ATL consistently improved yields within three batches of experiments, making the method of practical utility for chemical space exploration studies, such as drug discovery.
{"title":"Prospective active transfer learning on the formal coupling of amines and carboxylic acids to form secondary alkyl bonds","authors":"Eunjae Shim, Ambuj Tewari, Paul M. Zimmerman and Tim Cernak","doi":"10.1039/D5DD00309A","DOIUrl":"10.1039/D5DD00309A","url":null,"abstract":"<p >Tailoring a reaction condition to suit new substrates can be labor-intensive. While machine learning can aid this endeavor, conventional strategies require large datasets to make useful predictions. Active transfer learning (ATL) tackles this problem by leveraging previously collected reaction data and adaptively selecting reagent combinations. Here, ATL is prospectively applied to find improved reagent combinations for C(sp<small><sup>3</sup></small>)–C(sp<small><sup>3</sup></small>) cross-couplings between activated amines and carboxylic acids. The formation of carbon–carbon bonds from amines and acids is a powerful complement to the classic amide coupling, but the formation of sterically congested secondary alkyl groups studied here represents a challenge for catalysis. Our results demonstrate ATL consistently improved yields within three batches of experiments, making the method of practical utility for chemical space exploration studies, such as drug discovery.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 3693-3700"},"PeriodicalIF":6.2,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12593407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phuoc-Chung Van Nguyen, Van-Thinh To, Ngoc-Vi Nguyen Tran, Tieu-Long Phan, Tuyen Ngoc Truong, Thomas Gärtner, Daniel Merkle and Peter F. Stadler
Chemical reactions typically follow mechanistic templates and hence fall into a manageable number of clearly distinguishable classes that are usually labeled by names of chemists who discovered or explored them. These “named reactions” form the core of reaction ontologies and are associated with specific synthetic procedures. Classification of chemical reactions, therefore, is an essential step for the construction and maintenance of reaction-template databases, in particular for the purpose of synthetic route planning. Large-scale reaction databases, however, typically do not annotate named reactions systematically. Although many methods have been proposed, most are sensitive to reagent variations and do not guarantee permutation invariance. Here, we propose SynCat, a graph-based framework that leverages molecule-level cross-attention to perform precise reagent detection and role assignment, eliminating unwanted species. SynCat ensures permutation invariance by employing a pairwise summation of participant embeddings. This method balances mechanistic specificity derived from individual-molecule embeddings with the order-independent nature of the pairwise representation. Across multiple benchmark datasets, SynCat outperformed established reaction fingerprints, DRFP and RXNFP, achieving a mean classification accuracy of 0.988, together with enhanced scalability.
{"title":"SynCat: molecule-level attention graph neural network for precise reaction classification","authors":"Phuoc-Chung Van Nguyen, Van-Thinh To, Ngoc-Vi Nguyen Tran, Tieu-Long Phan, Tuyen Ngoc Truong, Thomas Gärtner, Daniel Merkle and Peter F. Stadler","doi":"10.1039/D5DD00367A","DOIUrl":"https://doi.org/10.1039/D5DD00367A","url":null,"abstract":"<p >Chemical reactions typically follow mechanistic templates and hence fall into a manageable number of clearly distinguishable classes that are usually labeled by names of chemists who discovered or explored them. These “named reactions” form the core of reaction ontologies and are associated with specific synthetic procedures. Classification of chemical reactions, therefore, is an essential step for the construction and maintenance of reaction-template databases, in particular for the purpose of synthetic route planning. Large-scale reaction databases, however, typically do not annotate named reactions systematically. Although many methods have been proposed, most are sensitive to reagent variations and do not guarantee permutation invariance. Here, we propose SynCat, a graph-based framework that leverages molecule-level cross-attention to perform precise reagent detection and role assignment, eliminating unwanted species. SynCat ensures permutation invariance by employing a pairwise summation of participant embeddings. This method balances mechanistic specificity derived from individual-molecule embeddings with the order-independent nature of the pairwise representation. Across multiple benchmark datasets, SynCat outperformed established reaction fingerprints, DRFP and RXNFP, achieving a mean classification accuracy of 0.988, together with enhanced scalability.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 241-253"},"PeriodicalIF":6.2,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00367a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}