Nesma Mousa , Hristo P. Varbanov , Vidya Kaipanchery , Elisabetta Gabano , Mauro Ravera , Andrey A. Toropov , Larisa Charochkina , Filipe Menezes , Guillaume Godin , Igor V. Tetko
{"title":"Online OCHEM multi-task model for solubility and lipophilicity prediction of platinum complexes","authors":"Nesma Mousa , Hristo P. Varbanov , Vidya Kaipanchery , Elisabetta Gabano , Mauro Ravera , Andrey A. Toropov , Larisa Charochkina , Filipe Menezes , Guillaume Godin , Igor V. Tetko","doi":"10.1016/j.jinorgbio.2025.112890","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at <span><span>https://ochem.eu/article/31</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":364,"journal":{"name":"Journal of Inorganic Biochemistry","volume":"269 ","pages":"Article 112890"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Inorganic Biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0162013425000704","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Predicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at https://ochem.eu/article/31.
期刊介绍:
The Journal of Inorganic Biochemistry is an established international forum for research in all aspects of Biological Inorganic Chemistry. Original papers of a high scientific level are published in the form of Articles (full length papers), Short Communications, Focused Reviews and Bioinorganic Methods. Topics include: the chemistry, structure and function of metalloenzymes; the interaction of inorganic ions and molecules with proteins and nucleic acids; the synthesis and properties of coordination complexes of biological interest including both structural and functional model systems; the function of metal- containing systems in the regulation of gene expression; the role of metals in medicine; the application of spectroscopic methods to determine the structure of metallobiomolecules; the preparation and characterization of metal-based biomaterials; and related systems. The emphasis of the Journal is on the structure and mechanism of action of metallobiomolecules.