Yulong Zhao, Zhoudong Zhang, Xiaotian Kong, Kai Wang, Yaxuan Wang, Jie Jia, Huanqiu Li, Sheng Tian
{"title":"Prediction of Drug-Induced Liver Injury: From Molecular Physicochemical Properties and Scaffold Architectures to Machine Learning Approaches","authors":"Yulong Zhao, Zhoudong Zhang, Xiaotian Kong, Kai Wang, Yaxuan Wang, Jie Jia, Huanqiu Li, Sheng Tian","doi":"10.1111/cbdd.14607","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The process of developing new drugs is widely acknowledged as being time-intensive and requiring substantial financial investment. Despite ongoing efforts to reduce time and expenses in drug development, ensuring medication safety remains an urgent problem. One of the major problems involved in drug development is hepatotoxicity, specifically known as drug-induced liver injury (DILI). The popularity of new drugs often poses a significant barrier during development and frequently leads to their recall after launch. In silico methods have many advantages compared with traditional in vivo and in vitro assays. To establish a more precise and reliable prediction model, it is necessary to utilize an extensive and high-quality database consisting of information on drug molecule properties and structural patterns. In addition, we should also carefully select appropriate molecular descriptors that can be used to accurately depict compound characteristics. The aim of this study was to conduct a comprehensive investigation into the prediction of DILI. First, we conducted a comparative analysis of the physicochemical properties of extensively well-prepared DILI-positive and DILI-negative compounds. Then, we used classic substructure dissection methods to identify structural pattern differences between these two different types of chemical molecules. These findings indicate that it is not feasible to establish property or substructure-based rules for distinguishing between DILI-positive and DILI-negative compounds. Finally, we developed quantitative classification models for predicting DILI using the naïve Bayes classifier (NBC) and recursive partitioning (RP) machine learning techniques. The optimal DILI prediction model was obtained using NBC, which combines 21 physicochemical properties, the <i>VolSurf</i> descriptors and the LCFP_10 fingerprint set. This model achieved a global accuracy (GA) of 0.855 and an area under the curve (AUC) of 0.704 for the training set, while the corresponding values were 0.619 and 0.674 for the test set, respectively. Moreover, indicative substructural fragments favorable or unfavorable for DILI were identified from the best naïve Bayesian classification model. These findings may help prioritize lead compounds in the early stage of drug development pipelines.</p>\n </div>","PeriodicalId":143,"journal":{"name":"Chemical Biology & Drug Design","volume":"104 2","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Biology & Drug Design","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14607","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The process of developing new drugs is widely acknowledged as being time-intensive and requiring substantial financial investment. Despite ongoing efforts to reduce time and expenses in drug development, ensuring medication safety remains an urgent problem. One of the major problems involved in drug development is hepatotoxicity, specifically known as drug-induced liver injury (DILI). The popularity of new drugs often poses a significant barrier during development and frequently leads to their recall after launch. In silico methods have many advantages compared with traditional in vivo and in vitro assays. To establish a more precise and reliable prediction model, it is necessary to utilize an extensive and high-quality database consisting of information on drug molecule properties and structural patterns. In addition, we should also carefully select appropriate molecular descriptors that can be used to accurately depict compound characteristics. The aim of this study was to conduct a comprehensive investigation into the prediction of DILI. First, we conducted a comparative analysis of the physicochemical properties of extensively well-prepared DILI-positive and DILI-negative compounds. Then, we used classic substructure dissection methods to identify structural pattern differences between these two different types of chemical molecules. These findings indicate that it is not feasible to establish property or substructure-based rules for distinguishing between DILI-positive and DILI-negative compounds. Finally, we developed quantitative classification models for predicting DILI using the naïve Bayes classifier (NBC) and recursive partitioning (RP) machine learning techniques. The optimal DILI prediction model was obtained using NBC, which combines 21 physicochemical properties, the VolSurf descriptors and the LCFP_10 fingerprint set. This model achieved a global accuracy (GA) of 0.855 and an area under the curve (AUC) of 0.704 for the training set, while the corresponding values were 0.619 and 0.674 for the test set, respectively. Moreover, indicative substructural fragments favorable or unfavorable for DILI were identified from the best naïve Bayesian classification model. These findings may help prioritize lead compounds in the early stage of drug development pipelines.
人们普遍认为,新药研发过程需要大量时间和资金投入。尽管人们一直在努力减少药物研发的时间和费用,但确保用药安全仍然是一个亟待解决的问题。药物开发过程中的主要问题之一是肝毒性,即药物性肝损伤(DILI)。新药的流行往往会在研发过程中构成重大障碍,并经常导致新药上市后被召回。与传统的体内和体外检测方法相比,硅学方法具有很多优势。要建立更精确、更可靠的预测模型,就必须利用由药物分子特性和结构模式信息组成的广泛而高质量的数据库。此外,我们还应该仔细选择适当的分子描述符,用于准确描述化合物的特征。本研究旨在对 DILI 的预测进行全面调查。首先,我们对广泛制备的 DILI 阳性化合物和 DILI 阴性化合物的理化性质进行了比较分析。然后,我们使用经典的子结构剖析方法来确定这两种不同类型化学分子之间的结构模式差异。这些发现表明,建立基于性质或亚结构的规则来区分 DILI 阳性和 DILI 阴性化合物是不可行的。最后,我们利用天真贝叶斯分类器(NBC)和递归分区(RP)机器学习技术开发了预测 DILI 的定量分类模型。使用 NBC 获得了最佳的 DILI 预测模型,该模型结合了 21 种理化性质、VolSurf 描述因子和 LCFP_10 指纹集。该模型在训练集上的全局准确度(GA)达到 0.855,曲线下面积(AUC)达到 0.704,而在测试集上的相应值分别为 0.619 和 0.674。此外,还从最佳的天真贝叶斯分类模型中确定了对 DILI 有利或不利的指示性亚结构片段。这些发现有助于在药物开发的早期阶段确定先导化合物的优先次序。
期刊介绍:
Chemical Biology & Drug Design is a peer-reviewed scientific journal that is dedicated to the advancement of innovative science, technology and medicine with a focus on the multidisciplinary fields of chemical biology and drug design. It is the aim of Chemical Biology & Drug Design to capture significant research and drug discovery that highlights new concepts, insight and new findings within the scope of chemical biology and drug design.