Online OCHEM multi-task model for solubility and lipophilicity prediction of platinum complexes

IF 3.8 2区 化学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY Journal of Inorganic Biochemistry Pub Date : 2025-03-10 DOI:10.1016/j.jinorgbio.2025.112890
Nesma Mousa , Hristo P. Varbanov , Vidya Kaipanchery , Elisabetta Gabano , Mauro Ravera , Andrey A. Toropov , Larisa Charochkina , Filipe Menezes , Guillaume Godin , Igor V. Tetko
{"title":"Online OCHEM multi-task model for solubility and lipophilicity prediction of platinum complexes","authors":"Nesma Mousa ,&nbsp;Hristo P. Varbanov ,&nbsp;Vidya Kaipanchery ,&nbsp;Elisabetta Gabano ,&nbsp;Mauro Ravera ,&nbsp;Andrey A. Toropov ,&nbsp;Larisa Charochkina ,&nbsp;Filipe Menezes ,&nbsp;Guillaume Godin ,&nbsp;Igor V. Tetko","doi":"10.1016/j.jinorgbio.2025.112890","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at <span><span>https://ochem.eu/article/31</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":364,"journal":{"name":"Journal of Inorganic Biochemistry","volume":"269 ","pages":"Article 112890"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Inorganic Biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0162013425000704","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Predicting the solubility and lipophilicity of platinum(II, IV) complexes is essential for prioritizing potential anticancer candidates in drug discovery. This study introduces the first publicly available online model for predicting the solubility of platinum complexes, addressing the lack of literature and models in this regard. Using a time-split dataset, we developed a consensus model with a Root Mean Squared Error (RMSE) of 0.62 through 5-cross-validation on a training set of 284 historical compounds (solubility data reported prior to 2017). However, the RMSE increased to 0.86 when applied to a prospective test set of 108 compounds reported after 2017. Further analysis of the high prediction errors revealed that these inaccuracies are primarily attributed to the underrepresentation of novel chemical scaffolds, particularly Pt(IV) derivatives, in the training sets. For instance, a series of eight phenanthroline-containing compounds, not covered by the training set's chemical space, had an RMSE of 1.3. When the model was redeveloped using a combined dataset, the RMSE of this series significantly decreased to 0.34 under the same validation protocol. Additionally, we developed an interpretable linear model to identify structural features and functional groups that influence the solubility of platinum complexes. We further validated the correlation between solubility and lipophilicity, consistent with the Yalkowsky General Solubility Equation. Building on these insights, we developed a final multitask model that simultaneously predicts solubility and lipophilicity as two endpoints with RMSE = 0.62 and 0.44, respectively. The data and final developed model is available at https://ochem.eu/article/31.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Inorganic Biochemistry
Journal of Inorganic Biochemistry 生物-生化与分子生物学
CiteScore
7.00
自引率
10.30%
发文量
336
审稿时长
41 days
期刊介绍: The Journal of Inorganic Biochemistry is an established international forum for research in all aspects of Biological Inorganic Chemistry. Original papers of a high scientific level are published in the form of Articles (full length papers), Short Communications, Focused Reviews and Bioinorganic Methods. Topics include: the chemistry, structure and function of metalloenzymes; the interaction of inorganic ions and molecules with proteins and nucleic acids; the synthesis and properties of coordination complexes of biological interest including both structural and functional model systems; the function of metal- containing systems in the regulation of gene expression; the role of metals in medicine; the application of spectroscopic methods to determine the structure of metallobiomolecules; the preparation and characterization of metal-based biomaterials; and related systems. The emphasis of the Journal is on the structure and mechanism of action of metallobiomolecules.
期刊最新文献
Preparation of ferrocene‑iridium(III) acylhydrazone complexes and their anticancer application against A549 cell line Editorial Board Contents continued Graphical abstract TOC Graphical abstract TOC
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1