Yifei Yang, Runhan Shi, Zuchao Li, Shu Jiang, Bao-Liang Lu, Yang Yang, Hai Zhao
{"title":"BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction","authors":"Yifei Yang, Runhan Shi, Zuchao Li, Shu Jiang, Bao-Liang Lu, Yang Yang, Hai Zhao","doi":"arxiv-2408.10285","DOIUrl":null,"url":null,"abstract":"Retrosynthesis analysis is pivotal yet challenging in drug discovery and\norganic chemistry. Despite the proliferation of computational tools over the\npast decade, AI-based systems often fall short in generalizing across diverse\nreaction types and exploring alternative synthetic pathways. This paper\npresents BatGPT-Chem, a large language model with 15 billion parameters,\ntailored for enhanced retrosynthesis prediction. Integrating chemical tasks via\na unified framework of natural language and SMILES notation, this approach\nsynthesizes extensive instructional data from an expansive chemical database.\nEmploying both autoregressive and bidirectional training techniques across over\none hundred million instances, BatGPT-Chem captures a broad spectrum of\nchemical knowledge, enabling precise prediction of reaction conditions and\nexhibiting strong zero-shot capabilities. Superior to existing AI methods, our\nmodel demonstrates significant advancements in generating effective strategies\nfor complex molecules, as validated by stringent benchmark tests. BatGPT-Chem\nnot only boosts the efficiency and creativity of retrosynthetic analysis but\nalso establishes a new standard for computational tools in synthetic design.\nThis development empowers chemists to adeptly address the synthesis of novel\ncompounds, potentially expediting the innovation cycle in drug manufacturing\nand materials science. We release our trial platform at\n\\url{https://www.batgpt.net/dapp/chem}.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Retrosynthesis analysis is pivotal yet challenging in drug discovery and
organic chemistry. Despite the proliferation of computational tools over the
past decade, AI-based systems often fall short in generalizing across diverse
reaction types and exploring alternative synthetic pathways. This paper
presents BatGPT-Chem, a large language model with 15 billion parameters,
tailored for enhanced retrosynthesis prediction. Integrating chemical tasks via
a unified framework of natural language and SMILES notation, this approach
synthesizes extensive instructional data from an expansive chemical database.
Employing both autoregressive and bidirectional training techniques across over
one hundred million instances, BatGPT-Chem captures a broad spectrum of
chemical knowledge, enabling precise prediction of reaction conditions and
exhibiting strong zero-shot capabilities. Superior to existing AI methods, our
model demonstrates significant advancements in generating effective strategies
for complex molecules, as validated by stringent benchmark tests. BatGPT-Chem
not only boosts the efficiency and creativity of retrosynthetic analysis but
also establishes a new standard for computational tools in synthetic design.
This development empowers chemists to adeptly address the synthesis of novel
compounds, potentially expediting the innovation cycle in drug manufacturing
and materials science. We release our trial platform at
\url{https://www.batgpt.net/dapp/chem}.