ToxSTK: A multi-target toxicity assessment utilizing molecular structure and stacking ensemble learning.

IF 7 2区 医学 Q1 BIOLOGY Computers in biology and medicine Pub Date : 2024-12-06 DOI:10.1016/j.compbiomed.2024.109480
Surapong Boonsom, Panisara Chamnansil, Sarote Boonseng, Tarapong Srisongkram
{"title":"ToxSTK: A multi-target toxicity assessment utilizing molecular structure and stacking ensemble learning.","authors":"Surapong Boonsom, Panisara Chamnansil, Sarote Boonseng, Tarapong Srisongkram","doi":"10.1016/j.compbiomed.2024.109480","DOIUrl":null,"url":null,"abstract":"<p><p>Drug registration requires risk assessment of new active pharmaceutical ingredients or excipients to ensure they are safe for human health and the environment. However, traditional risk assessment is expensive and relies heavily on animal testing. Machine learning (ML) has been used as a risk assessment tool, providing less time, money, and involved animals than in vivo experiments. Despite that, the ML models often rely on a single model, which may introduce bias and unreliable prediction. Stacking ensemble learning is an ML framework that makes predictions based on multimodal outcomes. This framework performs well in quantitative structure-activity relationship (QSAR) studies. In this study, we developed ToxSTK, a multi-target toxicity assessment using stacking ensemble learning. We aimed to create an ML tool that facilitates toxicity assessments more affordably with reduced reliance on animal models. We focused on four key targets generally assessed in early-stage drug development: hERG toxicity, mTOR toxicity, PBMCs toxicity, and mutagenicity. Our model integrated 12 molecular fingerprints with 3 ML algorithms, generating 36 novel predictive features (PFs). These PFs were then combined to construct the final meta-decision model. Our results demonstrated that the ToxSTK model surpasses standard regression and classification metrics, ensuring it is highly reliable and accurate in predicting chemical toxicities within its application domain. This model passed the y-randomization test, confirming that the identified QSAR is robust and not due to random chance. Additionally, this model outperforms the existing ML methods for these endpoints, suggesting its effectiveness for risk assessment applications. We recommend incorporating this stacking ensemble learning framework into the chemical risk assessment pipeline to improve model generalization, accuracy, robustness, and reliability.</p>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"185 ","pages":"109480"},"PeriodicalIF":7.0000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compbiomed.2024.109480","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Drug registration requires risk assessment of new active pharmaceutical ingredients or excipients to ensure they are safe for human health and the environment. However, traditional risk assessment is expensive and relies heavily on animal testing. Machine learning (ML) has been used as a risk assessment tool, providing less time, money, and involved animals than in vivo experiments. Despite that, the ML models often rely on a single model, which may introduce bias and unreliable prediction. Stacking ensemble learning is an ML framework that makes predictions based on multimodal outcomes. This framework performs well in quantitative structure-activity relationship (QSAR) studies. In this study, we developed ToxSTK, a multi-target toxicity assessment using stacking ensemble learning. We aimed to create an ML tool that facilitates toxicity assessments more affordably with reduced reliance on animal models. We focused on four key targets generally assessed in early-stage drug development: hERG toxicity, mTOR toxicity, PBMCs toxicity, and mutagenicity. Our model integrated 12 molecular fingerprints with 3 ML algorithms, generating 36 novel predictive features (PFs). These PFs were then combined to construct the final meta-decision model. Our results demonstrated that the ToxSTK model surpasses standard regression and classification metrics, ensuring it is highly reliable and accurate in predicting chemical toxicities within its application domain. This model passed the y-randomization test, confirming that the identified QSAR is robust and not due to random chance. Additionally, this model outperforms the existing ML methods for these endpoints, suggesting its effectiveness for risk assessment applications. We recommend incorporating this stacking ensemble learning framework into the chemical risk assessment pipeline to improve model generalization, accuracy, robustness, and reliability.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
药品注册要求对新的活性药物成分或辅料进行风险评估,以确保它们对人类健康和环境安全。然而,传统的风险评估成本高昂,且严重依赖动物试验。机器学习(ML)已被用作一种风险评估工具,与活体实验相比,它节省了时间、金钱和动物。尽管如此,ML 模型往往依赖于单一模型,这可能会带来偏差和不可靠的预测。堆叠集合学习是一种基于多模态结果进行预测的 ML 框架。该框架在定量结构-活性关系(QSAR)研究中表现出色。在本研究中,我们利用堆叠集合学习开发了多目标毒性评估工具 ToxSTK。我们的目标是创建一种 ML 工具,以更低的成本促进毒性评估,同时减少对动物模型的依赖。我们重点研究了早期药物开发中通常要评估的四个关键靶点:hERG毒性、mTOR毒性、PBMCs毒性和致突变性。我们的模型将 12 个分子指纹与 3 种 ML 算法相结合,生成了 36 个新的预测特征 (PF)。然后将这些 PFs 结合起来,构建出最终的元决策模型。我们的研究结果表明,ToxSTK 模型超越了标准回归和分类指标,确保了它在应用领域内预测化学毒性时的高度可靠性和准确性。该模型通过了 y 随机化测试,证实了所识别的 QSAR 是稳健的,而不是随机的。此外,该模型在这些终点方面的表现优于现有的 ML 方法,表明其在风险评估应用中的有效性。我们建议将这种堆叠集合学习框架纳入化学品风险评估管道,以提高模型的泛化、准确性、稳健性和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
期刊最新文献
Out-of-distribution detection in digital pathology: Do foundation models bring the end to reconstruction-based approaches? Fusing CNNs and attention-mechanisms to improve real-time indoor Human Activity Recognition for classifying home-based physical rehabilitation exercises. Using the coefficient of determination to identify injury regions after stroke in pre-clinical FDG-PET images. Synthetic ECG signals generation: A scoping review. Differences in brain spindle density during sleep between patients with and without type 2 diabetes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1