Molecular Informatics最新文献

英文中文

Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules. 腐蚀科学的化学信息学：数据驱动的有机分子腐蚀抑制模型。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-11-01 Epub Date: 2024-10-15 DOI: 10.1002/minf.202400082

Igor Baskin, Yair Ein-Eli

This paper reviews the application of machine learning to the inhibition of corrosion by organic molecules. The methodologies considered include quantitative structure-property relationships (QSPR) and related data-driven approaches. The characteristic features of their key components are considered as applied to corrosion inhibition, including datasets, response properties, molecular descriptors, machine learning methods, and structure-property models. It is shown that the most important factors determining their choice and application features are: (1) the small or very small size of datasets, (2) the mechanism of corrosion inhibition associated with the adsorption of inhibitor molecules on the metal surface, and (3) multifactorial conditioning and noisiness of response property. On this basis, the application of machine learning to the inhibition of corrosion of materials based on iron, aluminum, and magnesium is considered. The main trends in the development of QSPR and related data-driven modeling of corrosion inhibition are discussed, the shortcomings and common errors are considered, and the prospects for their further development are outlined.

本文回顾了机器学习在有机分子腐蚀抑制方面的应用。考虑的方法包括定量结构-性质关系（QSPR）和相关的数据驱动方法。在将其应用于缓蚀时，考虑了其主要组成部分的特征，包括数据集、响应特性、分子描述符、机器学习方法和结构-特性模型。结果表明，决定其选择和应用特征的最重要因素是(1) 数据集的规模较小或非常小；(2) 与抑制剂分子在金属表面的吸附有关的缓蚀机制；(3) 响应特性的多因素调节和噪声。在此基础上，考虑了机器学习在铁、铝和镁基材料缓蚀方面的应用。讨论了 QSPR 和相关数据驱动缓蚀建模的主要发展趋势，指出了其不足之处和常见错误，并展望了其进一步发展的前景。

{"title":"Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules.","authors":"Igor Baskin, Yair Ein-Eli","doi":"10.1002/minf.202400082","DOIUrl":"10.1002/minf.202400082","url":null,"abstract":"This paper reviews the application of machine learning to the inhibition of corrosion by organic molecules. The methodologies considered include quantitative structure-property relationships (QSPR) and related data-driven approaches. The characteristic features of their key components are considered as applied to corrosion inhibition, including datasets, response properties, molecular descriptors, machine learning methods, and structure-property models. It is shown that the most important factors determining their choice and application features are: (1) the small or very small size of datasets, (2) the mechanism of corrosion inhibition associated with the adsorption of inhibitor molecules on the metal surface, and (3) multifactorial conditioning and noisiness of response property. On this basis, the application of machine learning to the inhibition of corrosion of materials based on iron, aluminum, and magnesium is considered. The main trends in the development of QSPR and related data-driven modeling of corrosion inhibition are discussed, the shortcomings and common errors are considered, and the prospects for their further development are outlined.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400082"},"PeriodicalIF":2.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

My 50 Years with Chemoinformatics. 我的化学信息学 50 年。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-11-01 Epub Date: 2024-10-15 DOI: 10.1002/minf.202400036

Johann Gasteiger

引用次数: 0

Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimental datasets. 工业规模的多任务 ADME/PK 预测：利用大型多样的实验数据集。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-10-01 Epub Date: 2024-07-08 DOI: 10.1002/minf.202400079

Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic

ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.

ADME（吸收、分布、代谢、排泄）特性是判断候选药物是否具有理想药代动力学（PK）特征的关键参数。在这项研究中，我们测试了多任务机器学习（ML）模型，这些模型是根据勃林格殷格翰公司内部生成的数据训练而成的，用于预测 ADME 和动物 PK 终点。我们在化合物设计阶段（即没有测试化合物的实验数据）和测试阶段（即可能有早期进行的实验数据）对模型进行了评估。利用现实的时间分割，我们发现基于图的多任务神经网络模型的性能明显优于单任务模型。为了解释多任务模型的成功，我们发现数据点数量最多的终点（理化终点、微粒体中的清除率）尤其能提高更复杂的 ADME 和 PK 终点的预测能力。总之，我们的研究深入探讨了如何充分利用制药公司的多个 ADME/PK 终点数据来优化多重任务模型的预测能力。

{"title":"Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimental datasets.","authors":"Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic","doi":"10.1002/minf.202400079","DOIUrl":"10.1002/minf.202400079","url":null,"abstract":"ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400079"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distinct binding hotspots for natural and synthetic agonists of FFA4 from in silico approaches. 从硅学方法看天然和合成 FFA4 激动剂的不同结合热点。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-10-01 Epub Date: 2024-07-24 DOI: 10.1002/minf.202400046

Guillaume Patient, Corentin Bedart, Naim A Khan, Nicolas Renault, Amaury Farce

FFA4 has gained interest in recent years since its deorphanization in 2005 and the characterization of the Free Fatty Acids receptors family for their therapeutic potential in metabolic disorders. The expression of FFA4 (also known as GPR120) in numerous organs throughout the human body makes this receptor a highly potent target, particularly in fat sensing and diet preference. This offers an attractive approach to tackle obesity and related metabolic diseases. Recent cryo-EM structures of the receptor have provided valuable information for a potential active state although the previous studies of FFA4 presented diverging information. We performed molecular docking and molecular dynamics simulations of four agonist ligands, TUG-891, Linoleic acid, α-Linolenic acid, and Oleic acid, based on a homology model. Our simulations, which accumulated a total of 2 μs of simulation, highlighted two binding hotspots at Arg99^2.64 and Lys293 (ECL3). The results indicate that the residues are located in separate areas of the binding pocket and interact with various types of ligands, implying different potential active states of FFA4 and a highly adaptable binding intra-receptor pocket. This article proposes additional structural characteristics and mechanisms for agonist binding that complement the experimental structures.

自 2005 年 FFA4 被非形态化，以及游离脂肪酸受体家族在新陈代谢疾病中的治疗潜力被定性以来，FFA4 近年来越来越受到人们的关注。FFA4（又称 GPR120）在人体众多器官中的表达使该受体成为一个非常有效的靶点，尤其是在脂肪感应和饮食偏好方面。这为解决肥胖和相关代谢疾病提供了一种极具吸引力的方法。尽管以前对 FFA4 的研究提供了不同的信息，但最近该受体的低温电子显微镜结构为潜在的活性状态提供了宝贵的信息。我们基于同源模型对四种激动剂配体 TUG-891、亚油酸、α-亚麻酸和油酸进行了分子对接和分子动力学模拟。我们的模拟总共耗时 2 μs，突出显示了 Arg992.64 和 Lys293（ECL3）处的两个结合热点。结果表明，这两个残基分别位于结合口袋的不同区域，并与不同类型的配体相互作用，这意味着 FFA4 具有不同的潜在活性状态和一个具有高度适应性的受体内结合口袋。本文提出了与实验结构互补的其他结构特征和激动剂结合机制。

{"title":"Distinct binding hotspots for natural and synthetic agonists of FFA4 from in silico approaches.","authors":"Guillaume Patient, Corentin Bedart, Naim A Khan, Nicolas Renault, Amaury Farce","doi":"10.1002/minf.202400046","DOIUrl":"10.1002/minf.202400046","url":null,"abstract":"FFA4 has gained interest in recent years since its deorphanization in 2005 and the characterization of the Free Fatty Acids receptors family for their therapeutic potential in metabolic disorders. The expression of FFA4 (also known as GPR120) in numerous organs throughout the human body makes this receptor a highly potent target, particularly in fat sensing and diet preference. This offers an attractive approach to tackle obesity and related metabolic diseases. Recent cryo-EM structures of the receptor have provided valuable information for a potential active state although the previous studies of FFA4 presented diverging information. We performed molecular docking and molecular dynamics simulations of four agonist ligands, TUG-891, Linoleic acid, α-Linolenic acid, and Oleic acid, based on a homology model. Our simulations, which accumulated a total of 2 μs of simulation, highlighted two binding hotspots at Arg992.64 and Lys293 (ECL3). The results indicate that the residues are located in separate areas of the binding pocket and interact with various types of ligands, implying different potential active states of FFA4 and a highly adaptable binding intra-receptor pocket. This article proposes additional structural characteristics and mechanisms for agonist binding that complement the experimental structures.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400046"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141752164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sulfotransferase-mediated phase II drug metabolism prediction of substrates and sites using accessibility and reactivity-based algorithms. 利用基于可及性和反应性的算法预测硫代转氨酶介导的 II 期药物代谢底物和位点。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-10-01 Epub Date: 2024-08-07 DOI: 10.1002/minf.202400008

Shivam Kumar Vyas, Avik Das, Upadhyayula Suryanarayana Murty, Vaibhav A Dixit

Sulphotransferases (SULTs) are a major phase II metabolic enzyme class contributing ~20 % to the Phase II metabolism of FDA-approved drugs. Ignoring the potential for SULT-mediated metabolism leaves a strong potential for drug-drug interactions, often causing late-stage drug discovery failures or black-boxed warnings on FDA labels. The existing models use only accessibility descriptors and machine learning (ML) methods for class and site of sulfonation (SOS) predictions for SULT. In this study, a variety of accessibility, reactivity, and hybrid models and algorithms have been developed to make accurate substrate and SOS predictions. Unlike the literature models, reactivity parameters for the aliphatic or aromatic hydroxyl groups (R/Ar-O-H), the Bond Dissociation Energy (BDE) gave accurate models with a True Positive Rate (TPR)=0.84 for SOS predictions. We offer mechanistic insights to explain these novel findings that are not recognized in the literature. The accessibility parameters like the ratio of Chemgauss4 Score (CGS) and Molecular Weight (MW) CGS/MW and distance from cofactor (Dis) were essential for class predictions and showed TPR=0.72. Substrates consistently had lower BDE, Dis, and CGS/MW than non-substrates. Hybrid models also performed acceptablely for SOS predictions. Using the best models, Algorithms gave an acceptable performance in class prediction: TPR=0.62, False Positive Rate (FPR)=0.24, Balanced accuracy (BA)=0.69, and SOS prediction: TPR=0.98, FPR=0.60, and BA=0.69. A rule-based method was added to improve the predictive performance, which improved the algorithm TPR, FPR, and BA. Validation using an external dataset of drug-like compounds gave class prediction: TPR=0.67, FPR=0.00, and SOS prediction: TPR=0.80 and FPR=0.44 for the best Algorithm. Comparisons with standard ML models also show that our algorithm shows higher predictive performance for classification on external datasets. Overall, these models and algorithms (SOS predictor) give accurate substrate class and site (SOS) predictions for SULT-mediated Phase II metabolism and will be valuable to the drug discovery community in academia and industry. The SOS predictor is freely available for academic/non-profit research via the GitHub link.

磺基转移酶（SULTs）是一种主要的 II 期代谢酶，在 FDA 批准药物的 II 期代谢中占约 20%。忽视 SULT 介导的潜在代谢可能会导致药物间的强烈相互作用，这往往会导致后期药物发现的失败或 FDA 标签上的黑框警告。现有模型仅使用可及性描述符和机器学习（ML）方法来预测 SULT 的类别和磺化位点（SOS）。本研究开发了多种可及性、反应性和混合模型及算法，以准确预测底物和 SOS。与文献模型不同的是，脂肪族或芳香族羟基（R/Ar-O-H）的反应性参数、键离解能（BDE）给出了准确的模型，SOS 预测的真阳性率（TPR）=0.84。我们从机理角度解释了这些在文献中未得到认可的新发现。可及性参数，如 Chemgauss4 Score（CGS）与分子量（MW）之比 CGS/MW 以及与辅助因子的距离（Dis），对于类别预测至关重要，其 TPR=0.72。底物的 BDE、Dis 和 CGS/MW 始终低于非底物。混合模型在 SOS 预测方面的表现也可以接受。使用最佳模型，算法在类别预测方面的表现可以接受：TPR=0.62，误报率（FPR）=0.24，平衡准确率（BA）=0.69，SOS 预测：SOS预测：TPR=0.98，FPR=0.60，BA=0.69。为提高预测性能，增加了基于规则的方法，从而提高了算法的 TPR、FPR 和 BA。使用外部类药物数据集进行验证后，得出了类预测结果：TPR=0.67, FPR=0.00, SOS 预测：最佳算法的 TPR=0.80 和 FPR=0.44。与标准 ML 模型的比较也表明，我们的算法对外部数据集的分类具有更高的预测性能。总之，这些模型和算法（SOS 预测器）能为 SULT 介导的第二阶段代谢提供准确的底物类别和位点（SOS）预测，对学术界和工业界的药物发现界很有价值。SOS 预测器可通过 GitHub 链接免费提供给学术/非营利研究使用。

{"title":"Sulfotransferase-mediated phase II drug metabolism prediction of substrates and sites using accessibility and reactivity-based algorithms.","authors":"Shivam Kumar Vyas, Avik Das, Upadhyayula Suryanarayana Murty, Vaibhav A Dixit","doi":"10.1002/minf.202400008","DOIUrl":"10.1002/minf.202400008","url":null,"abstract":"Sulphotransferases (SULTs) are a major phase II metabolic enzyme class contributing ~20 % to the Phase II metabolism of FDA-approved drugs. Ignoring the potential for SULT-mediated metabolism leaves a strong potential for drug-drug interactions, often causing late-stage drug discovery failures or black-boxed warnings on FDA labels. The existing models use only accessibility descriptors and machine learning (ML) methods for class and site of sulfonation (SOS) predictions for SULT. In this study, a variety of accessibility, reactivity, and hybrid models and algorithms have been developed to make accurate substrate and SOS predictions. Unlike the literature models, reactivity parameters for the aliphatic or aromatic hydroxyl groups (R/Ar-O-H), the Bond Dissociation Energy (BDE) gave accurate models with a True Positive Rate (TPR)=0.84 for SOS predictions. We offer mechanistic insights to explain these novel findings that are not recognized in the literature. The accessibility parameters like the ratio of Chemgauss4 Score (CGS) and Molecular Weight (MW) CGS/MW and distance from cofactor (Dis) were essential for class predictions and showed TPR=0.72. Substrates consistently had lower BDE, Dis, and CGS/MW than non-substrates. Hybrid models also performed acceptablely for SOS predictions. Using the best models, Algorithms gave an acceptable performance in class prediction: TPR=0.62, False Positive Rate (FPR)=0.24, Balanced accuracy (BA)=0.69, and SOS prediction: TPR=0.98, FPR=0.60, and BA=0.69. A rule-based method was added to improve the predictive performance, which improved the algorithm TPR, FPR, and BA. Validation using an external dataset of drug-like compounds gave class prediction: TPR=0.67, FPR=0.00, and SOS prediction: TPR=0.80 and FPR=0.44 for the best Algorithm. Comparisons with standard ML models also show that our algorithm shows higher predictive performance for classification on external datasets. Overall, these models and algorithms (SOS predictor) give accurate substrate class and site (SOS) predictions for SULT-mediated Phase II metabolism and will be valuable to the drug discovery community in academia and industry. The SOS predictor is freely available for academic/non-profit research via the GitHub link.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400008"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141897838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cover Picture: (Mol. Inf. 9/2024) 封面图片：（Mol.Inf.9/2024）

IF 3.6 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-09-13 DOI: 10.1002/minf.202480901

引用次数: 0

Virtual screening of natural products to enhance melanogenosis. 虚拟筛选提高黑色素生成的天然产品。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-09-01 Epub Date: 2024-06-12 DOI: 10.1002/minf.202300335

Colin Bournez, José-Manuel Gally, Samia Aci-Sèche, Philippe Bernard, Pascal Bonnet

Natural products have long been an important source of inspiration for medicinal chemistry and drug discovery. In the cosmetic field, they remain the major elements of the composition and serve as marketing asset. Recent research showed the implication of salt-inducible kinases on the melanin production in skin via MITF regulation. Finding new potent modulators on such target could open the way to several cosmetic applications to attenuate visible signs of photoaging and improve the tan without sun. Since virtual screening can be a powerful tool for detecting hit compounds in the early stages of a drug discovery process, we applied this method on salt-inducible kinase 2 to discover potential interesting compounds. Here, we present the different steps from the construction of a database of natural products, to the validation of a docking protocol and the results of the virtual screening. Hits from the screening were tested in vitro to confirm their efficiency and results are discussed.

长期以来，天然产品一直是药物化学和药物发现的重要灵感来源。在化妆品领域，天然产品仍然是化妆品的主要成分，也是市场营销的重要资产。最近的研究表明，盐诱导激酶通过 MITF 调控皮肤黑色素的生成。针对这种靶点寻找新的强效调节剂，可以为多种化妆品的应用开辟道路，以减轻明显的光老化迹象，并改善日晒后的肤色。由于虚拟筛选是药物发现过程早期阶段检测热门化合物的有力工具，我们将这种方法应用于盐诱导激酶 2，以发现潜在的有趣化合物。在此，我们介绍了从构建天然产物数据库到验证对接方案和虚拟筛选结果的不同步骤。我们对筛选出的新化合物进行了体外测试，以确认它们的有效性，并对结果进行了讨论。

引用次数: 0

Cumulative phylogenetic, sequence and structural analysis of Insulin superfamily proteins provide unique structure-function insights. 对胰岛素超家族蛋白的系统发育、序列和结构的累积分析提供了独特的结构-功能见解。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-09-01 Epub Date: 2024-07-08 DOI: 10.1002/minf.202300160

Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal

The insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxin proteins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. The recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis, which connects the evolution of these sequences (427 sequences) in the light of available functional and structural information including representative complex structures of ISPs with their cognate receptors. This study reveals (a) unusually high sequence conservation of IGFs (>90 % conservation in 184 sequences) and provides a possible structure-based rationale for such high sequence conservation; (b) provides an updated definition of the receptor-binding signature motif of the functionally diverse relaxin family members (c) provides a probable non-canonical C-peptide cleavage site in a few insulin sequences. The high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. We also propose a probable mechanism for C-peptide cleavage in a few distinct insulin sequences and redefine the receptor-binding signature motif of the relaxin family. Lastly, we provide a basis for minimally modified insulin mutants with potential therapeutic application, inspired by concomitant changes observed in other insulin superfamily protein members supported by molecular dynamics simulation.

胰岛素超家族蛋白（ISPs），特别是胰岛素、IGFs 和松弛素蛋白是动物生理的关键调节因子。众所周知，它们是由同一祖先基因进化而来，并分化成具有不同序列和不同功能的蛋白质，但通过高度保守的二硫键保持着相似的结构。最近，这些蛋白质的序列数据和结构激增，促使人们需要根据现有的功能和结构信息（包括 ISP 与其同源受体的代表性复合结构），对这些序列（427 个序列）的进化进行全面分析。这项研究揭示了：(a) IGFs 的序列保存率异常之高（184 个序列的保存率大于 90%），并为如此高的序列保存率提供了一个可能的基于结构的理由；(b) 为功能多样的弛缓素家族成员的受体结合标志图案提供了一个最新的定义；(c) 在一些胰岛素序列中提供了一个可能的非经典 C 肽裂解位点。IGFs 的高度保守性似乎代表了一种典型的情况，即通过与多个伙伴的重要生理相互作用来抵抗序列多样性。我们还提出了几个不同的胰岛素序列中 C 肽裂解的可能机制，并重新定义了松弛素家族的受体结合特征基团。最后，我们从分子动力学模拟支持下在其他胰岛素超家族蛋白成员中观察到的伴随变化中得到启发，为具有潜在治疗用途的最小修饰胰岛素突变体提供了基础。

{"title":"Cumulative phylogenetic, sequence and structural analysis of Insulin superfamily proteins provide unique structure-function insights.","authors":"Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal","doi":"10.1002/minf.202300160","DOIUrl":"10.1002/minf.202300160","url":null,"abstract":"The insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxin proteins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. The recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis, which connects the evolution of these sequences (427 sequences) in the light of available functional and structural information including representative complex structures of ISPs with their cognate receptors. This study reveals (a) unusually high sequence conservation of IGFs (>90 % conservation in 184 sequences) and provides a possible structure-based rationale for such high sequence conservation; (b) provides an updated definition of the receptor-binding signature motif of the functionally diverse relaxin family members (c) provides a probable non-canonical C-peptide cleavage site in a few insulin sequences. The high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. We also propose a probable mechanism for C-peptide cleavage in a few distinct insulin sequences and redefine the receptor-binding signature motif of the relaxin family. Lastly, we provide a basis for minimally modified insulin mutants with potential therapeutic application, inspired by concomitant changes observed in other insulin superfamily protein members supported by molecular dynamics simulation.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300160"},"PeriodicalIF":2.8,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction of blood-brain barrier permeability using machine learning approaches based on various molecular representation. 利用基于各种分子表征的机器学习方法预测血脑屏障通透性。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-09-01 Epub Date: 2024-06-12 DOI: 10.1002/minf.202300327

Li Liang, Zhiwen Liu, Xinyi Yang, Yanmin Zhang, Haichun Liu, Yadong Chen

The assessment of compound blood-brain barrier (BBB) permeability poses a significant challenge in the discovery of drugs targeting the central nervous system. Conventional experimental approaches to measure BBB permeability are labor-intensive, cost-ineffective, and time-consuming. In this study, we constructed six machine learning classification models by combining various machine learning algorithms and molecular representations. The model based on ExtraTree algorithm and random partitioning strategy obtains the best prediction result, with AUC value of 0.932±0.004 and balanced accuracy (BA) of 0.837±0.010 for the test set. We employed the SHAP method to identify important features associated with BBB permeability. In addition, matched molecular pair (MMP) analysis and representative substructure derivation method were utilized to uncover the transformation rules and distinctive structural features of BBB permeable compounds. The machine learning models proposed in this work can serve as an effective tool for assessing BBB permeability in the drug discovery for central nervous system disease.

化合物血脑屏障（BBB）通透性评估是发现中枢神经系统靶向药物的一大挑战。测量血脑屏障通透性的传统实验方法耗费大量人力、成本低且费时。在本研究中，我们结合各种机器学习算法和分子表征，构建了六个机器学习分类模型。基于 ExtraTree 算法和随机分区策略的模型获得了最佳预测结果，其 AUC 值为 0.932±0.004，测试集的平衡准确度（BA）为 0.837±0.010。我们采用 SHAP 方法来识别与 BBB 渗透性相关的重要特征。此外，我们还利用匹配分子对（MMP）分析法和代表性子结构推导法来揭示BBB渗透性化合物的转化规则和独特的结构特征。本研究提出的机器学习模型可作为评估中枢神经系统疾病药物研发中BBB渗透性的有效工具。

{"title":"Prediction of blood-brain barrier permeability using machine learning approaches based on various molecular representation.","authors":"Li Liang, Zhiwen Liu, Xinyi Yang, Yanmin Zhang, Haichun Liu, Yadong Chen","doi":"10.1002/minf.202300327","DOIUrl":"10.1002/minf.202300327","url":null,"abstract":"The assessment of compound blood-brain barrier (BBB) permeability poses a significant challenge in the discovery of drugs targeting the central nervous system. Conventional experimental approaches to measure BBB permeability are labor-intensive, cost-ineffective, and time-consuming. In this study, we constructed six machine learning classification models by combining various machine learning algorithms and molecular representations. The model based on ExtraTree algorithm and random partitioning strategy obtains the best prediction result, with AUC value of 0.932±0.004 and balanced accuracy (BA) of 0.837±0.010 for the test set. We employed the SHAP method to identify important features associated with BBB permeability. In addition, matched molecular pair (MMP) analysis and representative substructure derivation method were utilized to uncover the transformation rules and distinctive structural features of BBB permeable compounds. The machine learning models proposed in this work can serve as an effective tool for assessing BBB permeability in the drug discovery for central nervous system disease.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300327"},"PeriodicalIF":2.8,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cover Picture: (Mol. Inf. 8/2024) 封面图片：（Mol.Inf. 8/2024）

IF 3.6 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-08-12 DOI: 10.1002/minf.202480801

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Molecular Informatics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀