首页 > 最新文献

Chem-Bio Informatics Journal最新文献

英文 中文
Quantitative prediction of hERG inhibitory activities using support vector regression and the integrated hERG dataset in AMED cardiotoxicity database 使用支持向量回归和AMED心脏毒性数据库中整合的hERG数据集定量预测hERG抑制活性
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2021-10-01 DOI: 10.1273/cbij.21.70
Tomohiro Sato, Hitomi Yuki, T. Honma
The inhibition of hERG potassium channel is closely related to the prolonged QT interval, and to assess the risk could greatly contribute to the development of safer therapeutic compounds. In the hit-to-lead optimization stage of drug development, quantitative prediction of hERG inhibitory activity is crucial to design drug candidates without cardiotoxicity risk. Here, we developed a hERG regression model combining support vector regression (SVR) and descriptor selection by non-dominated sorting genetic algorithm (NSGA-II) based on AMED cardiotoxicity database consisting of hERG blocking information built by integrating public and commercial databases. To construct a regression model, 6,561 compounds with IC50 and/or Ki values were derived from AMED cardiotoxicity database, and randomly separated into training set (70%) for model building and test set (30%) for performance evaluation. To avoid overfitting by employing many non-relevant explanatory variables, NSGA-II, a variation of genetic algorithm for multiple objective optimization, was used for descriptor selection in order to maximize Q2 and minimize RMSE in 5-fold cross validation and minimize the number of used descriptors spontaneously. The prediction performance was then compared to those of ADMET predictor, commercial software providing various ADMET property predictions. The SVR model recorded R2 of 0.594 and RMSE of 0.604 for test set, clearly exceeding those of ADMET predictor (0.134 and 0.690, respectively). The regression model is available at our home page (https://drugdesign.riken.jp/hERG).
hERG钾通道的抑制与QT间期延长密切相关,评估其风险有助于开发更安全的治疗药物。在药物开发的hit-to-lead优化阶段,hERG抑制活性的定量预测对于设计无心脏毒性风险的候选药物至关重要。本研究基于整合公共和商业数据库构建的由hERG阻断信息组成的AMED心脏毒性数据库,建立了支持向量回归(SVR)和非支配排序遗传算法描述符选择(NSGA-II)相结合的hERG回归模型。为了构建回归模型,从AMED心脏毒性数据库中提取了具有IC50和/或Ki值的6,561种化合物,并将其随机分为训练集(70%)用于模型构建,测试集(30%)用于性能评估。为了避免使用许多不相关的解释变量进行过拟合,我们使用了一种多目标优化遗传算法NSGA-II进行描述符选择,以便在5倍交叉验证中最大化Q2和最小化RMSE,并最小化自发使用的描述符数量。然后将预测性能与ADMET预测器(提供各种ADMET属性预测的商业软件)的预测性能进行比较。测试集SVR模型的R2为0.594,RMSE为0.604,明显超过ADMET预测因子(分别为0.134和0.690)。回归模型可以在我们的主页上找到(https://drugdesign.riken.jp/hERG)。
{"title":"Quantitative prediction of hERG inhibitory activities using support vector regression and the integrated hERG dataset in AMED cardiotoxicity database","authors":"Tomohiro Sato, Hitomi Yuki, T. Honma","doi":"10.1273/cbij.21.70","DOIUrl":"https://doi.org/10.1273/cbij.21.70","url":null,"abstract":"The inhibition of hERG potassium channel is closely related to the prolonged QT interval, and to assess the risk could greatly contribute to the development of safer therapeutic compounds. In the hit-to-lead optimization stage of drug development, quantitative prediction of hERG inhibitory activity is crucial to design drug candidates without cardiotoxicity risk. Here, we developed a hERG regression model combining support vector regression (SVR) and descriptor selection by non-dominated sorting genetic algorithm (NSGA-II) based on AMED cardiotoxicity database consisting of hERG blocking information built by integrating public and commercial databases. To construct a regression model, 6,561 compounds with IC50 and/or Ki values were derived from AMED cardiotoxicity database, and randomly separated into training set (70%) for model building and test set (30%) for performance evaluation. To avoid overfitting by employing many non-relevant explanatory variables, NSGA-II, a variation of genetic algorithm for multiple objective optimization, was used for descriptor selection in order to maximize Q2 and minimize RMSE in 5-fold cross validation and minimize the number of used descriptors spontaneously. The prediction performance was then compared to those of ADMET predictor, commercial software providing various ADMET property predictions. The SVR model recorded R2 of 0.594 and RMSE of 0.604 for test set, clearly exceeding those of ADMET predictor (0.134 and 0.690, respectively). The regression model is available at our home page (https://drugdesign.riken.jp/hERG).","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"6 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88701172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Appropriate Evaluation Measurements for Regression Models 回归模型的适当评价测量
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2021-09-15 DOI: 10.1273/cbij.21.59
Tsuyoshi Esaki
In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.
近年来,加快寻找种子化合物的速度和降低药物研究的成本已成为一种必要。因此,我们期待计算机药物发现方法的贡献,即利用化合物的物理化学特征和亚结构指纹来预测候选新药。在不进行实验的情况下选择种子化合物可以使我们减少药物开发所需的时间和成本。然而,仅使用简单的线性模型来估计我们体内化合物的特性是不令人满意的,因为化合物的作用和分布是由我们体内的环境及其与其他分子的相互作用决定的。与简单的模型相比,更复杂的模型已经被用来估计具有较高预测精度的复合特性。因此,在选择适合研究目的的模型时,正确评估预测性能变得越来越重要。行列式系数,即众所周知的r2,是评价回归模型最著名的统计度量之一。然而,这种方法不能用于评价非线性模型。本文解释了使用决定系数的困难,并在以下两种情况下提出了适当的统计措施:交叉验证的均方误差(MSE),以及试验数据的观测值和预测值的MSE与相关系数。由于了解和正确使用统计方法是必要的,所建议的方法将支持有前途的种子化合物的有效选择和加速药物的发现。
{"title":"Appropriate Evaluation Measurements for Regression Models","authors":"Tsuyoshi Esaki","doi":"10.1273/cbij.21.59","DOIUrl":"https://doi.org/10.1273/cbij.21.59","url":null,"abstract":"In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"40 2 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83128039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Logistic regression and random forest unveil key molecular descriptors of druglikeness 逻辑回归和随机森林揭示了药物相似性的关键分子描述符
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2021-09-08 DOI: 10.1273/CBIJ.21.39
L. T. Billones, Nadia B. Morales, J. Billones
The identification of molecular descriptors that embody the chemical information for druglikeness will be a step forward in data-driven drug discovery and development endeavor. In this study, over 4000 Dragon-type molecular properties were generated for approximately 2000 known drugs and 2000 surrogate nondrugs. Logistic Regression (LogR) and Random Forest (RF) techniques were carried out to unveil the crucial molecular descriptors that can adequately classify a compound as drug or nondrug. Ten one-variable LogR models each demonstrated at least 70% prediction accuracy. A two-variable model consisting of HVcpx and MDDD correctly classified 85% of the test compounds. The best LogR model with 89.0% prediction accuracy identified five most influential descriptors for druglikeness: an information index HVcpx , topological index MDDD , a ring descriptor NNRS , X2A or average connectivity index of order 2, and walk and path count SRW05. The best RF model involving 10 only weakly correlated descriptors was found to be 92.5% accurate and at par with the RF and LogR models that consisted of over 200 variables. The model featured: molecular weight, MW ; average molecular weight, AMW ; rotatable bond fraction, RBF; percentage carbon, C%; maximal electrotopological negative variation, MAXDN ; all-path Wiener index, Wap ; structural information content index, neighborhood symmetry of 1 order, SIC1 ; number of nitrogen atoms, nN; 2D Petitjean shape index, PJI2 ; and self-returning walk count of order 5, SRW05 . Many of these descriptors have straightforward chemical interpretability and future applicability as druglikeness filters in virtual high throughput drug discovery.
识别包含药物相似性化学信息的分子描述符将是数据驱动的药物发现和开发努力的一个进步。在这项研究中,大约2000种已知药物和2000种替代非药物产生了4000多种龙型分子特性。逻辑回归(LogR)和随机森林(RF)技术揭示了关键的分子描述符,可以充分地将化合物分类为药物或非药物。10个单变量LogR模型均显示出至少70%的预测精度。由HVcpx和MDDD组成的双变量模型正确分类了85%的测试化合物。最佳LogR模型预测准确率为89.0%,识别出5个最具影响力的药物相似性描述符:信息指数HVcpx、拓扑指数MDDD、环状描述符NNRS、X2A或2阶平均连通性指数、行走路径数SRW05。最好的RF模型只包含10个弱相关描述符,准确率为92.5%,与包含200多个变量的RF和LogR模型相当。该模型的特点是:分子量,MW;平均分子量(AMW);可旋转键分数,RBF;碳百分比,C%;最大电拓扑负变异(MAXDN);全径Wiener指数;结构信息含量指数,1阶邻域对称,SIC1;氮原子数nN;二维Petitjean形状指数,PJI2;5阶自返回行走计数SRW05。许多这些描述符具有直接的化学可解释性和未来适用性,作为虚拟高通量药物发现中的药物相似过滤器。
{"title":"Logistic regression and random forest unveil key molecular descriptors of druglikeness","authors":"L. T. Billones, Nadia B. Morales, J. Billones","doi":"10.1273/CBIJ.21.39","DOIUrl":"https://doi.org/10.1273/CBIJ.21.39","url":null,"abstract":"The identification of molecular descriptors that embody the chemical information for druglikeness will be a step forward in data-driven drug discovery and development endeavor. In this study, over 4000 Dragon-type molecular properties were generated for approximately 2000 known drugs and 2000 surrogate nondrugs. Logistic Regression (LogR) and Random Forest (RF) techniques were carried out to unveil the crucial molecular descriptors that can adequately classify a compound as drug or nondrug. Ten one-variable LogR models each demonstrated at least 70% prediction accuracy. A two-variable model consisting of HVcpx and MDDD correctly classified 85% of the test compounds. The best LogR model with 89.0% prediction accuracy identified five most influential descriptors for druglikeness: an information index HVcpx , topological index MDDD , a ring descriptor NNRS , X2A or average connectivity index of order 2, and walk and path count SRW05. The best RF model involving 10 only weakly correlated descriptors was found to be 92.5% accurate and at par with the RF and LogR models that consisted of over 200 variables. The model featured: molecular weight, MW ; average molecular weight, AMW ; rotatable bond fraction, RBF; percentage carbon, C%; maximal electrotopological negative variation, MAXDN ; all-path Wiener index, Wap ; structural information content index, neighborhood symmetry of 1 order, SIC1 ; number of nitrogen atoms, nN; 2D Petitjean shape index, PJI2 ; and self-returning walk count of order 5, SRW05 . Many of these descriptors have straightforward chemical interpretability and future applicability as druglikeness filters in virtual high throughput drug discovery.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"113 1","pages":"39-58"},"PeriodicalIF":0.3,"publicationDate":"2021-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79190641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inference of genetic networks using random forests: performance improvement using a new variable importance measure 使用随机森林的遗传网络推理:使用一种新的变量重要性度量来提高性能
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2021-07-26 DOI: 10.21203/rs.3.rs-737867/v1
Shuhei Kimura, Yahiro Takeda, M. Tokuhisa, Mariko Okada
Background: Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. We think however that this measure has drawbacks in the inference of genetic networks. Results: In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC). Conclusion: This study proposed the random-input variable importance measure for the inference of genetic networks. The use of our measure improved the performance of the random-forest-based inference method. In this study, we checked the performance of the proposed measure only on several genetic network inference problems. However, the experimental results suggest that the proposed measure will work well in other applications of random forests.
背景:在目前提出的各种遗传网络推断方法中,本研究主要关注基于随机森林的方法。当采用基于随机森林的方法时,将置信度值分配给所有候选法规。据我们所知,所有基于随机森林的方法都使用基于树的机器学习技术中定义的标准变量重要性度量来进行分配。然而,我们认为这种方法在遗传网络的推断中存在缺陷。结果:因此,在本研究中,我们提出了一种替代度量,我们称之为“随机输入变量重要性度量”,并设计了一种新的推理方法,使用所提出的度量代替现有基于随机森林的推理方法中的标准度量。我们通过数值实验表明,随机输入变量重要性度量的使用将现有基于随机森林的推理方法的性能提高了45.5%,相对于召回精度曲线(AURPC)下的面积。结论:本研究提出了遗传网络推断的随机输入变量重要性测度。我们的度量的使用提高了基于随机森林的推理方法的性能。在这项研究中,我们只在几个遗传网络推理问题上检查了所提出的度量的性能。然而,实验结果表明,所提出的措施在随机森林的其他应用中也能很好地工作。
{"title":"Inference of genetic networks using random forests: performance improvement using a new variable importance measure","authors":"Shuhei Kimura, Yahiro Takeda, M. Tokuhisa, Mariko Okada","doi":"10.21203/rs.3.rs-737867/v1","DOIUrl":"https://doi.org/10.21203/rs.3.rs-737867/v1","url":null,"abstract":"\u0000 Background: Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. We think however that this measure has drawbacks in the inference of genetic networks. Results: In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC). Conclusion: This study proposed the random-input variable importance measure for the inference of genetic networks. The use of our measure improved the performance of the random-forest-based inference method. In this study, we checked the performance of the proposed measure only on several genetic network inference problems. However, the experimental results suggest that the proposed measure will work well in other applications of random forests.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"23 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85828545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Web Server with a Simple Interface for Coarse-grained Molecular Dynamics of DNA Nanostructures 带有简单接口的Web服务器,用于DNA纳米结构的粗粒度分子动力学
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2021-04-30 DOI: 10.1273/CBIJ.21.28
Yudai Yamashita, Kotaro Watanabe, S. Murata, I. Kawamata
We introduce an automated procedure of coarse-grained molecular dynamic simulation for DNA nanostructure that has great potential for realizing molecular robotics. As DNA origami is now a standardized technology to fabricate DNA nanostructures with high precision, various computer-aided design software has been developed. For example, a design tool called caDNAno with a simple and intuitive interface is widely used for designing DNA origami structures. Further, a simulation tool called oxDNA is used to predict the behavior of such nanostructures based on coarse-grained molecular dynamics. These tools, however, are not linked directly; thus, repeating the cycle of design and simulation is cumbersome to the user. Moreover, the computer skills required to setup, launch, and run an oxDNA simulation are a potential barrier for non-experts. In our proposal, oxDNA simulation can be launched on a web server simply by providing a caDNAno file; the web server then analyzes the simulation results and provides a visual response. The validity of the proposal is demonstrated using an example. The advantages of our proposed method compared with other conventional methods are also described. This simple-to-use interface for user-friendly simulation of DNA origami eliminates stress to users and accelerates the design process of complicated DNA nanostructures such as wireframe architecture.
我们介绍了一种自动化的DNA纳米结构粗粒度分子动力学模拟程序,该程序具有实现分子机器人的巨大潜力。DNA折纸技术已经成为一种标准化的技术,可以高精度地制造DNA纳米结构,各种计算机辅助设计软件已经被开发出来。例如,一种名为caDNAno的设计工具具有简单直观的界面,被广泛用于设计DNA折纸结构。此外,一种名为oxDNA的模拟工具被用于基于粗粒度分子动力学来预测这种纳米结构的行为。然而,这些工具并没有直接联系;因此,重复设计和模拟的循环对用户来说是麻烦的。此外,设置、启动和运行oxDNA模拟所需的计算机技能对非专家来说是一个潜在的障碍。在我们的建议中,oxDNA模拟可以通过简单地提供一个caDNAno文件在web服务器上启动;然后,web服务器分析仿真结果并提供可视化响应。通过算例验证了该方法的有效性。并介绍了该方法与其他传统方法相比的优点。这个简单易用的界面用于用户友好的DNA折纸模拟,消除了用户的压力,并加速了复杂DNA纳米结构(如线框结构)的设计过程。
{"title":"Web Server with a Simple Interface for Coarse-grained Molecular Dynamics of DNA Nanostructures","authors":"Yudai Yamashita, Kotaro Watanabe, S. Murata, I. Kawamata","doi":"10.1273/CBIJ.21.28","DOIUrl":"https://doi.org/10.1273/CBIJ.21.28","url":null,"abstract":"We introduce an automated procedure of coarse-grained molecular dynamic simulation for DNA nanostructure that has great potential for realizing molecular robotics. As DNA origami is now a standardized technology to fabricate DNA nanostructures with high precision, various computer-aided design software has been developed. For example, a design tool called caDNAno with a simple and intuitive interface is widely used for designing DNA origami structures. Further, a simulation tool called oxDNA is used to predict the behavior of such nanostructures based on coarse-grained molecular dynamics. These tools, however, are not linked directly; thus, repeating the cycle of design and simulation is cumbersome to the user. Moreover, the computer skills required to setup, launch, and run an oxDNA simulation are a potential barrier for non-experts. In our proposal, oxDNA simulation can be launched on a web server simply by providing a caDNAno file; the web server then analyzes the simulation results and provides a visual response. The validity of the proposal is demonstrated using an example. The advantages of our proposed method compared with other conventional methods are also described. This simple-to-use interface for user-friendly simulation of DNA origami eliminates stress to users and accelerates the design process of complicated DNA nanostructures such as wireframe architecture.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"47 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81289172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
In Silico Discovery of Natural Products Against Dengue RNA-Dependent RNA Polymerase Drug Target 针对登革热RNA依赖RNA聚合酶药物靶点的天然产物的计算机发现
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2021-03-15 DOI: 10.1273/CBIJ.21.11
J. Billones, N. A. B. Clavio
The viral infection caused by the dengue virus (DENV) is one of the most challenging diseases in the tropical regions of the world. The absence of drugs for dengue to this date calls for intense efforts to discover and develop the much coveted therapeutics for this mosquito-borne disease. One of the most attractive antiviral targets is the DENV RNAdependent RNA polymerase (RdRp), which catalyzes the de novo initiation as well as elongation of the flavivirus RNA genome. In this work, almost 5000 natural products were docked to DENV RdRp. The top 197 molecules with greater binding energies than the known ligand of the target were further clustered down to furnish 35 classes of molecular structures. These compounds with satisfactory predicted drug properties and with known natural origin can be further explored to pave the way for the first anti-dengue drug.
登革热病毒(DENV)引起的病毒感染是世界热带地区最具挑战性的疾病之一。由于迄今为止缺乏治疗登革热的药物,因此需要加紧努力,发现和开发这种蚊媒疾病的令人垂涎的治疗方法。最具吸引力的抗病毒靶点之一是DENV RNA依赖的RNA聚合酶(RdRp),它催化黄病毒RNA基因组的从头起始和延伸。在这项工作中,近5000种天然产物与DENV RdRp对接。将结合能大于已知配体的197个分子进一步聚类,形成35类分子结构。这些具有令人满意的预测药物性质和已知天然来源的化合物可以进一步探索,为第一种抗登革热药物铺平道路。
{"title":"In Silico Discovery of Natural Products Against Dengue RNA-Dependent RNA Polymerase Drug Target","authors":"J. Billones, N. A. B. Clavio","doi":"10.1273/CBIJ.21.11","DOIUrl":"https://doi.org/10.1273/CBIJ.21.11","url":null,"abstract":"The viral infection caused by the dengue virus (DENV) is one of the most challenging diseases in the tropical regions of the world. The absence of drugs for dengue to this date calls for intense efforts to discover and develop the much coveted therapeutics for this mosquito-borne disease. One of the most attractive antiviral targets is the DENV RNAdependent RNA polymerase (RdRp), which catalyzes the de novo initiation as well as elongation of the flavivirus RNA genome. In this work, almost 5000 natural products were docked to DENV RdRp. The top 197 molecules with greater binding energies than the known ligand of the target were further clustered down to furnish 35 classes of molecular structures. These compounds with satisfactory predicted drug properties and with known natural origin can be further explored to pave the way for the first anti-dengue drug.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"41 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74951143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining self-organizing maps and hierarchical clustering for protein–ligand interaction analysis in post-fragment molecular orbital calculation 结合自组织图谱和层次聚类分析片段后分子轨道计算中的蛋白质-配体相互作用
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2021-01-29 DOI: 10.1273/CBIJ.21.1
Y. Kawashima, Natsumi Mori, N. Kawashita, Yu-Shi Tian, T. Takagi
Fragment molecular orbital (FMO) calculation is a useful ab initio method for analyzing protein–ligand interactions in the current structure-based drug design. When multiple ligands exist for one receptor, a post-FMO calculation tool is required because of large numbers of interaction energy decomposition terms calculated using this method. In this study, a method that combines self-organizing maps (SOM) and hierarchical clustering analysis (HCA) was proposed to analyze the results of the FMO energy components. This method could effectively compress the high-dimensional energy terms and is expected to be useful to analyze the interaction between protein and ligands. A case study of antitype 2 diabetes mellitus target DPP-IV and its inhibitors was analyzed to verify the feasibility of the proposed method. After performing dimensional compression using SOM and further grouping using HCA, we obtained superclasses of the inhibitors based on the dispersion energy (DI), which showed consistency with structural information, indicating that further analyses of detailed energies per superclass can be an effective approach for obtaining important ligand–protein interactions.
片段分子轨道(FMO)计算是当前基于结构的药物设计中分析蛋白质-配体相互作用的一种有用的从头计算方法。当一个受体存在多个配体时,由于该方法计算的相互作用能分解项较多,需要使用fmo后计算工具。本研究提出了一种结合自组织映射(SOM)和层次聚类分析(HCA)的方法来分析FMO能量分量的结果。该方法可以有效地压缩高维能量项,有望用于分析蛋白质与配体之间的相互作用。以2型糖尿病靶点DPP-IV及其抑制剂为例,验证了该方法的可行性。通过SOM进行维数压缩和HCA进一步分组,我们得到了基于弥散能(DI)的抑制剂超类,这与结构信息一致,表明进一步分析每个超类的详细能量是获得重要配体-蛋白质相互作用的有效方法。
{"title":"Combining self-organizing maps and hierarchical clustering for protein–ligand interaction analysis in post-fragment molecular orbital calculation","authors":"Y. Kawashima, Natsumi Mori, N. Kawashita, Yu-Shi Tian, T. Takagi","doi":"10.1273/CBIJ.21.1","DOIUrl":"https://doi.org/10.1273/CBIJ.21.1","url":null,"abstract":"Fragment molecular orbital (FMO) calculation is a useful ab initio method for analyzing protein–ligand interactions in the current structure-based drug design. When multiple ligands exist for one receptor, a post-FMO calculation tool is required because of large numbers of interaction energy decomposition terms calculated using this method. In this study, a method that combines self-organizing maps (SOM) and hierarchical clustering analysis (HCA) was proposed to analyze the results of the FMO energy components. This method could effectively compress the high-dimensional energy terms and is expected to be useful to analyze the interaction between protein and ligands. A case study of antitype 2 diabetes mellitus target DPP-IV and its inhibitors was analyzed to verify the feasibility of the proposed method. After performing dimensional compression using SOM and further grouping using HCA, we obtained superclasses of the inhibitors based on the dispersion energy (DI), which showed consistency with structural information, indicating that further analyses of detailed energies per superclass can be an effective approach for obtaining important ligand–protein interactions.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"105 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2021-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80866109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of relationships between chemical substructures and antibiotic resistance-related gene expression in bacteria: Adapting a canonical correlation analysis for small sample data of gathered features using consensus clustering 估计细菌中化学亚结构与抗生素耐药性相关基因表达之间的关系:使用共识聚类对收集到的特征的小样本数据进行典型相关分析
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2020-09-30 DOI: 10.1273/CBIJ.20.58
Tsuyoshi Esaki, Takaaki Horinouchi, Yayoi Natsume-Kitatani, Yosui Nojima, I. Sakane, H. Matsui
The emergence of antibiotic-resistant bacteria is a serious public health concern. Understanding the relationships between antibiotic compounds and phenotypic changes related to the acquisition of resistance is important to estimate the effective characteristics of drug seeds. It is important to analyze the relationships between phenotypic changes and compound structures; hence, we performed a canonical correlation analysis (CCA) for high dimensional phenotypic and compound structure datasets. For the CCA, the required sample number must be larger than the feature number; however, collecting a large amount of data can sometimes be difficult. Thus, we combined consensus clustering to gather and reduce features. The CCA was performed using the clustered features, and it revealed relationships between the features of chemical substructures and the expression level of genes related to several types of antibiotic resistance.
耐抗生素细菌的出现是一个严重的公共卫生问题。了解抗生素化合物与获得耐药性相关的表型变化之间的关系对于估计药物种子的有效特性非常重要。分析表型变化与化合物结构之间的关系十分重要;因此,我们对高维表型和复合结构数据集进行了典型相关分析(CCA)。对于CCA,所需的样本数必须大于特征数;然而,收集大量数据有时会很困难。因此,我们结合共识聚类来收集和减少特征。利用聚类特征进行CCA,它揭示了化学亚结构特征与几种抗生素耐药性相关基因表达水平之间的关系。
{"title":"Estimation of relationships between chemical substructures and antibiotic resistance-related gene expression in bacteria: Adapting a canonical correlation analysis for small sample data of gathered features using consensus clustering","authors":"Tsuyoshi Esaki, Takaaki Horinouchi, Yayoi Natsume-Kitatani, Yosui Nojima, I. Sakane, H. Matsui","doi":"10.1273/CBIJ.20.58","DOIUrl":"https://doi.org/10.1273/CBIJ.20.58","url":null,"abstract":"The emergence of antibiotic-resistant bacteria is a serious public health concern. Understanding the relationships between antibiotic compounds and phenotypic changes related to the acquisition of resistance is important to estimate the effective characteristics of drug seeds. It is important to analyze the relationships between phenotypic changes and compound structures; hence, we performed a canonical correlation analysis (CCA) for high dimensional phenotypic and compound structure datasets. For the CCA, the required sample number must be larger than the feature number; however, collecting a large amount of data can sometimes be difficult. Thus, we combined consensus clustering to gather and reduce features. The CCA was performed using the clustered features, and it revealed relationships between the features of chemical substructures and the expression level of genes related to several types of antibiotic resistance.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"107 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80782026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Skin sensitizer classification using dual-input machine learning model 基于双输入机器学习模型的皮肤致敏剂分类
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2020-09-11 DOI: 10.1273/cbij.20.54
K. Matsumura
Skin sensitization is an important aspect of occupational and consumer safety. Because of the ban on animal testing for skin sensitization in Europe, in silico approaches to predict skin sensitizers are needed. Recently, several machine learning approaches, such as the gradient boosting decision tree (GBDT) and deep neural networks (DNNs), have been applied to chemical reactivity prediction, showing remarkable accuracy. Herein, we performed a study on DNN- and GBDT-based modeling to investigate their potential for use in predicting skin sensitizers. We separately input two types of chemical properties (physical and structural properties) in the form of one-hot labeled vectors into single- and dual-input models. All the trained dual-input models achieved higher accuracy than single-input models, suggesting that a multi-input machine learning model with different types of chemical properties has excellent potential for skin sensitizer classification.
皮肤致敏是职业和消费者安全的一个重要方面。由于欧洲禁止动物皮肤致敏试验,因此需要用计算机方法来预测皮肤致敏剂。近年来,梯度增强决策树(GBDT)和深度神经网络(dnn)等机器学习方法已被应用于化学反应性预测,并显示出显著的准确性。在此,我们进行了一项基于DNN和gbdt的建模研究,以研究它们在预测皮肤致敏剂方面的潜力。我们分别将两种类型的化学性质(物理性质和结构性质)以单热标记向量的形式输入到单输入和双输入模型中。所有训练的双输入模型都取得了比单输入模型更高的准确率,这表明具有不同类型化学性质的多输入机器学习模型具有良好的皮肤敏化剂分类潜力。
{"title":"Skin sensitizer classification using dual-input machine learning model","authors":"K. Matsumura","doi":"10.1273/cbij.20.54","DOIUrl":"https://doi.org/10.1273/cbij.20.54","url":null,"abstract":"Skin sensitization is an important aspect of occupational and consumer safety. Because of the ban on animal testing for skin sensitization in Europe, in silico approaches to predict skin sensitizers are needed. Recently, several machine learning approaches, such as the gradient boosting decision tree (GBDT) and deep neural networks (DNNs), have been applied to chemical reactivity prediction, showing remarkable accuracy. Herein, we performed a study on DNN- and GBDT-based modeling to investigate their potential for use in predicting skin sensitizers. We separately input two types of chemical properties (physical and structural properties) in the form of one-hot labeled vectors into single- and dual-input models. All the trained dual-input models achieved higher accuracy than single-input models, suggesting that a multi-input machine learning model with different types of chemical properties has excellent potential for skin sensitizer classification.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"8 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2020-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82052247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A distribution-dependent analysis of open-field test movies 露天测试影片的分布相关分析
IF 0.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2020-08-31 DOI: 10.1273/cbij.20.44
T. Konishi, Haruna Ohrui
Although the open-field test has been widely used, its reliability and compatibility are frequently questioned. Many indicating parameters were introduced for this test; however, they did not take data distributions into consideration. This oversight may have caused the problems mentioned above. Here, an exploratory approach for the analysis of video records of tests of elderly mice was taken that described the distributions using the least number of parameters. The locomotor activity of the animals was separated into two clusters: dash and search. The accelerations found in each of the clusters were distributed normally. The speed and the duration of the clusters exhibited an exponential distribution. Although the exponential model includes a single parameter, an additional parameter that indicated instability of the behaviour was required in many cases for fitting to the data. As this instability parameter exhibited an inverse correlation with speed, the function of the brain that maintained stability would be required for a better performance. According to the distributions, the travel distance, which has been regarded as an important indicator, was not a robust estimator of the animals’ condition.
虽然露天试验得到了广泛的应用,但其可靠性和兼容性经常受到质疑。该试验引入了许多指示参数;然而,他们没有考虑数据分布。这种疏忽可能导致了上面提到的问题。本文采用一种探索性的方法对老年小鼠实验的视频记录进行分析,该方法使用最少数量的参数来描述分布。动物的运动活动分为两类:冲刺和搜索。在每个星团中发现的加速度是正态分布的。集群的速度和持续时间呈指数分布。虽然指数模型包括一个参数,但在许多情况下,为了拟合数据,需要一个表明行为不稳定性的附加参数。由于这种不稳定性参数与速度呈负相关,因此保持稳定性的大脑功能将是获得更好表现所必需的。从分布上看,作为重要指标的行进距离并不能很好地反映动物的状况。
{"title":"A distribution-dependent analysis of open-field test movies","authors":"T. Konishi, Haruna Ohrui","doi":"10.1273/cbij.20.44","DOIUrl":"https://doi.org/10.1273/cbij.20.44","url":null,"abstract":"Although the open-field test has been widely used, its reliability and compatibility are frequently questioned. Many indicating parameters were introduced for this test; however, they did not take data distributions into consideration. This oversight may have caused the problems mentioned above. Here, an exploratory approach for the analysis of video records of tests of elderly mice was taken that described the distributions using the least number of parameters. The locomotor activity of the animals was separated into two clusters: dash and search. The accelerations found in each of the clusters were distributed normally. The speed and the duration of the clusters exhibited an exponential distribution. Although the exponential model includes a single parameter, an additional parameter that indicated instability of the behaviour was required in many cases for fitting to the data. As this instability parameter exhibited an inverse correlation with speed, the function of the brain that maintained stability would be required for a better performance. According to the distributions, the travel distance, which has been regarded as an important indicator, was not a robust estimator of the animals’ condition.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"145 1","pages":""},"PeriodicalIF":0.3,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73685542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Chem-Bio Informatics Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1