Pub Date : 2024-11-01Epub Date: 2024-10-15DOI: 10.1002/minf.202400082
Igor Baskin, Yair Ein-Eli
This paper reviews the application of machine learning to the inhibition of corrosion by organic molecules. The methodologies considered include quantitative structure-property relationships (QSPR) and related data-driven approaches. The characteristic features of their key components are considered as applied to corrosion inhibition, including datasets, response properties, molecular descriptors, machine learning methods, and structure-property models. It is shown that the most important factors determining their choice and application features are: (1) the small or very small size of datasets, (2) the mechanism of corrosion inhibition associated with the adsorption of inhibitor molecules on the metal surface, and (3) multifactorial conditioning and noisiness of response property. On this basis, the application of machine learning to the inhibition of corrosion of materials based on iron, aluminum, and magnesium is considered. The main trends in the development of QSPR and related data-driven modeling of corrosion inhibition are discussed, the shortcomings and common errors are considered, and the prospects for their further development are outlined.
{"title":"Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules.","authors":"Igor Baskin, Yair Ein-Eli","doi":"10.1002/minf.202400082","DOIUrl":"10.1002/minf.202400082","url":null,"abstract":"<p><p>This paper reviews the application of machine learning to the inhibition of corrosion by organic molecules. The methodologies considered include quantitative structure-property relationships (QSPR) and related data-driven approaches. The characteristic features of their key components are considered as applied to corrosion inhibition, including datasets, response properties, molecular descriptors, machine learning methods, and structure-property models. It is shown that the most important factors determining their choice and application features are: (1) the small or very small size of datasets, (2) the mechanism of corrosion inhibition associated with the adsorption of inhibitor molecules on the metal surface, and (3) multifactorial conditioning and noisiness of response property. On this basis, the application of machine learning to the inhibition of corrosion of materials based on iron, aluminum, and magnesium is considered. The main trends in the development of QSPR and related data-driven modeling of corrosion inhibition are discussed, the shortcomings and common errors are considered, and the prospects for their further development are outlined.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400082"},"PeriodicalIF":2.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01Epub Date: 2024-10-15DOI: 10.1002/minf.202400036
Johann Gasteiger
{"title":"My 50 Years with Chemoinformatics.","authors":"Johann Gasteiger","doi":"10.1002/minf.202400036","DOIUrl":"10.1002/minf.202400036","url":null,"abstract":"","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400036"},"PeriodicalIF":2.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-08DOI: 10.1002/minf.202400079
Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic
ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.
{"title":"Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimental datasets.","authors":"Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic","doi":"10.1002/minf.202400079","DOIUrl":"10.1002/minf.202400079","url":null,"abstract":"<p><p>ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400079"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-24DOI: 10.1002/minf.202400046
Guillaume Patient, Corentin Bedart, Naim A Khan, Nicolas Renault, Amaury Farce
FFA4 has gained interest in recent years since its deorphanization in 2005 and the characterization of the Free Fatty Acids receptors family for their therapeutic potential in metabolic disorders. The expression of FFA4 (also known as GPR120) in numerous organs throughout the human body makes this receptor a highly potent target, particularly in fat sensing and diet preference. This offers an attractive approach to tackle obesity and related metabolic diseases. Recent cryo-EM structures of the receptor have provided valuable information for a potential active state although the previous studies of FFA4 presented diverging information. We performed molecular docking and molecular dynamics simulations of four agonist ligands, TUG-891, Linoleic acid, α-Linolenic acid, and Oleic acid, based on a homology model. Our simulations, which accumulated a total of 2 μs of simulation, highlighted two binding hotspots at Arg992.64 and Lys293 (ECL3). The results indicate that the residues are located in separate areas of the binding pocket and interact with various types of ligands, implying different potential active states of FFA4 and a highly adaptable binding intra-receptor pocket. This article proposes additional structural characteristics and mechanisms for agonist binding that complement the experimental structures.
{"title":"Distinct binding hotspots for natural and synthetic agonists of FFA4 from in silico approaches.","authors":"Guillaume Patient, Corentin Bedart, Naim A Khan, Nicolas Renault, Amaury Farce","doi":"10.1002/minf.202400046","DOIUrl":"10.1002/minf.202400046","url":null,"abstract":"<p><p>FFA4 has gained interest in recent years since its deorphanization in 2005 and the characterization of the Free Fatty Acids receptors family for their therapeutic potential in metabolic disorders. The expression of FFA4 (also known as GPR120) in numerous organs throughout the human body makes this receptor a highly potent target, particularly in fat sensing and diet preference. This offers an attractive approach to tackle obesity and related metabolic diseases. Recent cryo-EM structures of the receptor have provided valuable information for a potential active state although the previous studies of FFA4 presented diverging information. We performed molecular docking and molecular dynamics simulations of four agonist ligands, TUG-891, Linoleic acid, α-Linolenic acid, and Oleic acid, based on a homology model. Our simulations, which accumulated a total of 2 μs of simulation, highlighted two binding hotspots at Arg99<sup>2.64</sup> and Lys293 (ECL3). The results indicate that the residues are located in separate areas of the binding pocket and interact with various types of ligands, implying different potential active states of FFA4 and a highly adaptable binding intra-receptor pocket. This article proposes additional structural characteristics and mechanisms for agonist binding that complement the experimental structures.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400046"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141752164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-08-07DOI: 10.1002/minf.202400008
Shivam Kumar Vyas, Avik Das, Upadhyayula Suryanarayana Murty, Vaibhav A Dixit
Sulphotransferases (SULTs) are a major phase II metabolic enzyme class contributing ~20 % to the Phase II metabolism of FDA-approved drugs. Ignoring the potential for SULT-mediated metabolism leaves a strong potential for drug-drug interactions, often causing late-stage drug discovery failures or black-boxed warnings on FDA labels. The existing models use only accessibility descriptors and machine learning (ML) methods for class and site of sulfonation (SOS) predictions for SULT. In this study, a variety of accessibility, reactivity, and hybrid models and algorithms have been developed to make accurate substrate and SOS predictions. Unlike the literature models, reactivity parameters for the aliphatic or aromatic hydroxyl groups (R/Ar-O-H), the Bond Dissociation Energy (BDE) gave accurate models with a True Positive Rate (TPR)=0.84 for SOS predictions. We offer mechanistic insights to explain these novel findings that are not recognized in the literature. The accessibility parameters like the ratio of Chemgauss4 Score (CGS) and Molecular Weight (MW) CGS/MW and distance from cofactor (Dis) were essential for class predictions and showed TPR=0.72. Substrates consistently had lower BDE, Dis, and CGS/MW than non-substrates. Hybrid models also performed acceptablely for SOS predictions. Using the best models, Algorithms gave an acceptable performance in class prediction: TPR=0.62, False Positive Rate (FPR)=0.24, Balanced accuracy (BA)=0.69, and SOS prediction: TPR=0.98, FPR=0.60, and BA=0.69. A rule-based method was added to improve the predictive performance, which improved the algorithm TPR, FPR, and BA. Validation using an external dataset of drug-like compounds gave class prediction: TPR=0.67, FPR=0.00, and SOS prediction: TPR=0.80 and FPR=0.44 for the best Algorithm. Comparisons with standard ML models also show that our algorithm shows higher predictive performance for classification on external datasets. Overall, these models and algorithms (SOS predictor) give accurate substrate class and site (SOS) predictions for SULT-mediated Phase II metabolism and will be valuable to the drug discovery community in academia and industry. The SOS predictor is freely available for academic/non-profit research via the GitHub link.
磺基转移酶(SULTs)是一种主要的 II 期代谢酶,在 FDA 批准药物的 II 期代谢中占约 20%。忽视 SULT 介导的潜在代谢可能会导致药物间的强烈相互作用,这往往会导致后期药物发现的失败或 FDA 标签上的黑框警告。现有模型仅使用可及性描述符和机器学习(ML)方法来预测 SULT 的类别和磺化位点(SOS)。本研究开发了多种可及性、反应性和混合模型及算法,以准确预测底物和 SOS。与文献模型不同的是,脂肪族或芳香族羟基(R/Ar-O-H)的反应性参数、键离解能(BDE)给出了准确的模型,SOS 预测的真阳性率(TPR)=0.84。我们从机理角度解释了这些在文献中未得到认可的新发现。可及性参数,如 Chemgauss4 Score(CGS)与分子量(MW)之比 CGS/MW 以及与辅助因子的距离(Dis),对于类别预测至关重要,其 TPR=0.72。底物的 BDE、Dis 和 CGS/MW 始终低于非底物。混合模型在 SOS 预测方面的表现也可以接受。使用最佳模型,算法在类别预测方面的表现可以接受:TPR=0.62,误报率(FPR)=0.24,平衡准确率(BA)=0.69,SOS 预测:SOS预测:TPR=0.98,FPR=0.60,BA=0.69。为提高预测性能,增加了基于规则的方法,从而提高了算法的 TPR、FPR 和 BA。使用外部类药物数据集进行验证后,得出了类预测结果:TPR=0.67, FPR=0.00, SOS 预测:最佳算法的 TPR=0.80 和 FPR=0.44。与标准 ML 模型的比较也表明,我们的算法对外部数据集的分类具有更高的预测性能。总之,这些模型和算法(SOS 预测器)能为 SULT 介导的第二阶段代谢提供准确的底物类别和位点(SOS)预测,对学术界和工业界的药物发现界很有价值。SOS 预测器可通过 GitHub 链接免费提供给学术/非营利研究使用。
{"title":"Sulfotransferase-mediated phase II drug metabolism prediction of substrates and sites using accessibility and reactivity-based algorithms.","authors":"Shivam Kumar Vyas, Avik Das, Upadhyayula Suryanarayana Murty, Vaibhav A Dixit","doi":"10.1002/minf.202400008","DOIUrl":"10.1002/minf.202400008","url":null,"abstract":"<p><p>Sulphotransferases (SULTs) are a major phase II metabolic enzyme class contributing ~20 % to the Phase II metabolism of FDA-approved drugs. Ignoring the potential for SULT-mediated metabolism leaves a strong potential for drug-drug interactions, often causing late-stage drug discovery failures or black-boxed warnings on FDA labels. The existing models use only accessibility descriptors and machine learning (ML) methods for class and site of sulfonation (SOS) predictions for SULT. In this study, a variety of accessibility, reactivity, and hybrid models and algorithms have been developed to make accurate substrate and SOS predictions. Unlike the literature models, reactivity parameters for the aliphatic or aromatic hydroxyl groups (R/Ar-O-H), the Bond Dissociation Energy (BDE) gave accurate models with a True Positive Rate (TPR)=0.84 for SOS predictions. We offer mechanistic insights to explain these novel findings that are not recognized in the literature. The accessibility parameters like the ratio of Chemgauss4 Score (CGS) and Molecular Weight (MW) CGS/MW and distance from cofactor (Dis) were essential for class predictions and showed TPR=0.72. Substrates consistently had lower BDE, Dis, and CGS/MW than non-substrates. Hybrid models also performed acceptablely for SOS predictions. Using the best models, Algorithms gave an acceptable performance in class prediction: TPR=0.62, False Positive Rate (FPR)=0.24, Balanced accuracy (BA)=0.69, and SOS prediction: TPR=0.98, FPR=0.60, and BA=0.69. A rule-based method was added to improve the predictive performance, which improved the algorithm TPR, FPR, and BA. Validation using an external dataset of drug-like compounds gave class prediction: TPR=0.67, FPR=0.00, and SOS prediction: TPR=0.80 and FPR=0.44 for the best Algorithm. Comparisons with standard ML models also show that our algorithm shows higher predictive performance for classification on external datasets. Overall, these models and algorithms (SOS predictor) give accurate substrate class and site (SOS) predictions for SULT-mediated Phase II metabolism and will be valuable to the drug discovery community in academia and industry. The SOS predictor is freely available for academic/non-profit research via the GitHub link.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400008"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141897838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-06-12DOI: 10.1002/minf.202300335
Colin Bournez, José-Manuel Gally, Samia Aci-Sèche, Philippe Bernard, Pascal Bonnet
Natural products have long been an important source of inspiration for medicinal chemistry and drug discovery. In the cosmetic field, they remain the major elements of the composition and serve as marketing asset. Recent research showed the implication of salt-inducible kinases on the melanin production in skin via MITF regulation. Finding new potent modulators on such target could open the way to several cosmetic applications to attenuate visible signs of photoaging and improve the tan without sun. Since virtual screening can be a powerful tool for detecting hit compounds in the early stages of a drug discovery process, we applied this method on salt-inducible kinase 2 to discover potential interesting compounds. Here, we present the different steps from the construction of a database of natural products, to the validation of a docking protocol and the results of the virtual screening. Hits from the screening were tested in vitro to confirm their efficiency and results are discussed.
{"title":"Virtual screening of natural products to enhance melanogenosis.","authors":"Colin Bournez, José-Manuel Gally, Samia Aci-Sèche, Philippe Bernard, Pascal Bonnet","doi":"10.1002/minf.202300335","DOIUrl":"10.1002/minf.202300335","url":null,"abstract":"<p><p>Natural products have long been an important source of inspiration for medicinal chemistry and drug discovery. In the cosmetic field, they remain the major elements of the composition and serve as marketing asset. Recent research showed the implication of salt-inducible kinases on the melanin production in skin via MITF regulation. Finding new potent modulators on such target could open the way to several cosmetic applications to attenuate visible signs of photoaging and improve the tan without sun. Since virtual screening can be a powerful tool for detecting hit compounds in the early stages of a drug discovery process, we applied this method on salt-inducible kinase 2 to discover potential interesting compounds. Here, we present the different steps from the construction of a database of natural products, to the validation of a docking protocol and the results of the virtual screening. Hits from the screening were tested in vitro to confirm their efficiency and results are discussed.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300335"},"PeriodicalIF":2.8,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-07-08DOI: 10.1002/minf.202300160
Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal
The insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxin proteins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. The recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis, which connects the evolution of these sequences (427 sequences) in the light of available functional and structural information including representative complex structures of ISPs with their cognate receptors. This study reveals (a) unusually high sequence conservation of IGFs (>90 % conservation in 184 sequences) and provides a possible structure-based rationale for such high sequence conservation; (b) provides an updated definition of the receptor-binding signature motif of the functionally diverse relaxin family members (c) provides a probable non-canonical C-peptide cleavage site in a few insulin sequences. The high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. We also propose a probable mechanism for C-peptide cleavage in a few distinct insulin sequences and redefine the receptor-binding signature motif of the relaxin family. Lastly, we provide a basis for minimally modified insulin mutants with potential therapeutic application, inspired by concomitant changes observed in other insulin superfamily protein members supported by molecular dynamics simulation.
胰岛素超家族蛋白(ISPs),特别是胰岛素、IGFs 和松弛素蛋白是动物生理的关键调节因子。众所周知,它们是由同一祖先基因进化而来,并分化成具有不同序列和不同功能的蛋白质,但通过高度保守的二硫键保持着相似的结构。最近,这些蛋白质的序列数据和结构激增,促使人们需要根据现有的功能和结构信息(包括 ISP 与其同源受体的代表性复合结构),对这些序列(427 个序列)的进化进行全面分析。这项研究揭示了:(a) IGFs 的序列保存率异常之高(184 个序列的保存率大于 90%),并为如此高的序列保存率提供了一个可能的基于结构的理由;(b) 为功能多样的弛缓素家族成员的受体结合标志图案提供了一个最新的定义;(c) 在一些胰岛素序列中提供了一个可能的非经典 C 肽裂解位点。IGFs 的高度保守性似乎代表了一种典型的情况,即通过与多个伙伴的重要生理相互作用来抵抗序列多样性。我们还提出了几个不同的胰岛素序列中 C 肽裂解的可能机制,并重新定义了松弛素家族的受体结合特征基团。最后,我们从分子动力学模拟支持下在其他胰岛素超家族蛋白成员中观察到的伴随变化中得到启发,为具有潜在治疗用途的最小修饰胰岛素突变体提供了基础。
{"title":"Cumulative phylogenetic, sequence and structural analysis of Insulin superfamily proteins provide unique structure-function insights.","authors":"Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal","doi":"10.1002/minf.202300160","DOIUrl":"10.1002/minf.202300160","url":null,"abstract":"<p><p>The insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxin proteins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. The recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis, which connects the evolution of these sequences (427 sequences) in the light of available functional and structural information including representative complex structures of ISPs with their cognate receptors. This study reveals (a) unusually high sequence conservation of IGFs (>90 % conservation in 184 sequences) and provides a possible structure-based rationale for such high sequence conservation; (b) provides an updated definition of the receptor-binding signature motif of the functionally diverse relaxin family members (c) provides a probable non-canonical C-peptide cleavage site in a few insulin sequences. The high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. We also propose a probable mechanism for C-peptide cleavage in a few distinct insulin sequences and redefine the receptor-binding signature motif of the relaxin family. Lastly, we provide a basis for minimally modified insulin mutants with potential therapeutic application, inspired by concomitant changes observed in other insulin superfamily protein members supported by molecular dynamics simulation.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300160"},"PeriodicalIF":2.8,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The assessment of compound blood-brain barrier (BBB) permeability poses a significant challenge in the discovery of drugs targeting the central nervous system. Conventional experimental approaches to measure BBB permeability are labor-intensive, cost-ineffective, and time-consuming. In this study, we constructed six machine learning classification models by combining various machine learning algorithms and molecular representations. The model based on ExtraTree algorithm and random partitioning strategy obtains the best prediction result, with AUC value of 0.932±0.004 and balanced accuracy (BA) of 0.837±0.010 for the test set. We employed the SHAP method to identify important features associated with BBB permeability. In addition, matched molecular pair (MMP) analysis and representative substructure derivation method were utilized to uncover the transformation rules and distinctive structural features of BBB permeable compounds. The machine learning models proposed in this work can serve as an effective tool for assessing BBB permeability in the drug discovery for central nervous system disease.
{"title":"Prediction of blood-brain barrier permeability using machine learning approaches based on various molecular representation.","authors":"Li Liang, Zhiwen Liu, Xinyi Yang, Yanmin Zhang, Haichun Liu, Yadong Chen","doi":"10.1002/minf.202300327","DOIUrl":"10.1002/minf.202300327","url":null,"abstract":"<p><p>The assessment of compound blood-brain barrier (BBB) permeability poses a significant challenge in the discovery of drugs targeting the central nervous system. Conventional experimental approaches to measure BBB permeability are labor-intensive, cost-ineffective, and time-consuming. In this study, we constructed six machine learning classification models by combining various machine learning algorithms and molecular representations. The model based on ExtraTree algorithm and random partitioning strategy obtains the best prediction result, with AUC value of 0.932±0.004 and balanced accuracy (BA) of 0.837±0.010 for the test set. We employed the SHAP method to identify important features associated with BBB permeability. In addition, matched molecular pair (MMP) analysis and representative substructure derivation method were utilized to uncover the transformation rules and distinctive structural features of BBB permeable compounds. The machine learning models proposed in this work can serve as an effective tool for assessing BBB permeability in the drug discovery for central nervous system disease.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300327"},"PeriodicalIF":2.8,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}