首页 > 最新文献

Molecular Informatics最新文献

英文 中文
Predicting S. aureus antimicrobial resistance with interpretable genomic space maps. 利用可解释的基因组空间图预测金黄色葡萄球菌的抗菌药耐药性。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-05-01 Epub Date: 2024-02-22 DOI: 10.1002/minf.202300263
Karina Pikalyova, Alexey Orlov, Dragos Horvath, Gilles Marcou, Alexandre Varnek

Increasing antimicrobial resistance (AMR) represents a global healthcare threat. To decrease the spread of AMR and associated mortality, methods for rapid selection of optimal antibiotic treatment are urgently needed. Machine learning (ML) models based on genomic data to predict resistant phenotypes can serve as a fast screening tool prior to phenotypic testing. Nonetheless, many existing ML methods lack interpretability. Therefore, we present a methodology for visualization of sequence space and AMR prediction based on the non-linear dimensionality reduction method - generative topographic mapping (GTM). This approach, applied to AMR data of >5000 S. aureus isolates retrieved from the PATRIC database, yielded GTM models with reasonable accuracy for all drugs (balanced accuracy values ≥0.75). The Generative Topographic Maps (GTMs) represent data in the form of illustrative maps of the genomic space and allow for antibiotic-wise comparison of resistant phenotypes. The maps were also found to be useful for the analysis of genetic determinants responsible for drug resistance. Overall, the GTM-based methodology is a useful tool for both the illustrative exploration of the genomic sequence space and AMR prediction.

抗菌素耐药性(AMR)的不断增加对全球医疗保健构成了威胁。为了减少 AMR 的传播和相关死亡率,迫切需要快速选择最佳抗生素治疗方法。基于基因组数据预测耐药性表型的机器学习(ML)模型可作为表型测试前的快速筛选工具。然而,许多现有的 ML 方法缺乏可解释性。因此,我们提出了一种基于非线性降维方法--生成地形图(GTM)的序列空间可视化和 AMR 预测方法。这种方法适用于从 PATRIC 数据库中检索到的超过 5000 个金黄色葡萄球菌分离物的 AMR 数据,对所有药物都产生了具有合理准确度的 GTM 模型(平衡准确度值≥0.75)。生成地形图(GTM)以基因组空间示意图的形式表示数据,可对抗生素耐药表型进行比较。研究还发现,生成地形图有助于分析导致耐药性的基因决定因素。总之,基于 GTM 的方法对于基因组序列空间的说明性探索和 AMR 预测都是一种有用的工具。
{"title":"Predicting S. aureus antimicrobial resistance with interpretable genomic space maps.","authors":"Karina Pikalyova, Alexey Orlov, Dragos Horvath, Gilles Marcou, Alexandre Varnek","doi":"10.1002/minf.202300263","DOIUrl":"10.1002/minf.202300263","url":null,"abstract":"<p><p>Increasing antimicrobial resistance (AMR) represents a global healthcare threat. To decrease the spread of AMR and associated mortality, methods for rapid selection of optimal antibiotic treatment are urgently needed. Machine learning (ML) models based on genomic data to predict resistant phenotypes can serve as a fast screening tool prior to phenotypic testing. Nonetheless, many existing ML methods lack interpretability. Therefore, we present a methodology for visualization of sequence space and AMR prediction based on the non-linear dimensionality reduction method - generative topographic mapping (GTM). This approach, applied to AMR data of >5000 S. aureus isolates retrieved from the PATRIC database, yielded GTM models with reasonable accuracy for all drugs (balanced accuracy values ≥0.75). The Generative Topographic Maps (GTMs) represent data in the form of illustrative maps of the genomic space and allow for antibiotic-wise comparison of resistant phenotypes. The maps were also found to be useful for the analysis of genetic determinants responsible for drug resistance. Overall, the GTM-based methodology is a useful tool for both the illustrative exploration of the genomic sequence space and AMR prediction.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139932061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cover Picture: (Mol. Inf. 4/2024) 封面图片:(Mol.Inf.4/2024)
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-04-23 DOI: 10.1002/minf.202480401
{"title":"Cover Picture: (Mol. Inf. 4/2024)","authors":"","doi":"10.1002/minf.202480401","DOIUrl":"https://doi.org/10.1002/minf.202480401","url":null,"abstract":"","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140801378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovery of a pocket network on the domain 5 of the TrkB receptor – A potential new target in the quest for the new ligands 发现 TrkB 受体结构域 5 的口袋网络--寻找新配体的潜在新目标
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-04-15 DOI: 10.1002/minf.202400043
Mirjana Antonijevic, Jana Sopkova‐de Oliveira Santos, Patrick Dallemagne, Christophe Rochais
The important role that the neurotrophin tyrosine kinase receptor ‐ TrkB has in the pathogenesis of several neurodegenerative conditions such are Alzheimer's disease, Parkinson's disease, Huntington's disease, has been well described. This shouldn't be a surprise, since in the physiological conditions, once activated by brain‐derived neurotrophic factor (BDNF) and neurotrophin‐4/5 (NT‐4/5), the TrkB receptor promotes neuronal survival, differentiation and synaptic function. Considering that the natural ligands for TrkB receptor are large proteins, it is a challenge to discover small molecule capable to mimic their effects.Even though, the surface of receptor that is interacting with BDNF or NT‐4/5 is known, there was always a question which pocket and interaction is responsible for activation of it. In order to answer this challenging question, we have used molecular dynamic (MD) simulations and Pocketron algorithm which enabled us to detect, for the first time, a pocket network existing in the interacting domain (d5) of the receptor; to describe them and to see how they are communicating with each other. This new discovery gave us potential new areas on receptor that can be targeted and used for structure‐based drug design approach in the development of the new ligands.
神经营养素酪氨酸激酶受体(TrkB)在阿尔茨海默病、帕金森病、亨廷顿病等多种神经退行性疾病的发病机制中发挥着重要作用,这一点已得到充分描述。这并不奇怪,因为在生理条件下,一旦被脑源性神经营养因子(BDNF)和神经营养素-4/5(NT-4/5)激活,TrkB 受体就会促进神经元的存活、分化和突触功能。考虑到 TrkB 受体的天然配体是大型蛋白质,发现能够模拟其效应的小分子是一项挑战。尽管与 BDNF 或 NT-4/5 相互作用的受体表面已被知晓,但一直存在的问题是,哪个口袋和相互作用负责激活它。为了回答这个具有挑战性的问题,我们利用分子动力学(MD)模拟和 Pocketron 算法,首次发现了存在于受体相互作用结构域(d5)中的口袋网络,并对其进行了描述,了解了它们是如何相互沟通的。这一新发现为我们提供了受体上潜在的新区域,我们可以将其作为目标,并在开发新配体时采用基于结构的药物设计方法。
{"title":"Discovery of a pocket network on the domain 5 of the TrkB receptor – A potential new target in the quest for the new ligands","authors":"Mirjana Antonijevic, Jana Sopkova‐de Oliveira Santos, Patrick Dallemagne, Christophe Rochais","doi":"10.1002/minf.202400043","DOIUrl":"https://doi.org/10.1002/minf.202400043","url":null,"abstract":"The important role that the neurotrophin tyrosine kinase receptor ‐ TrkB has in the pathogenesis of several neurodegenerative conditions such are Alzheimer's disease, Parkinson's disease, Huntington's disease, has been well described. This shouldn't be a surprise, since in the physiological conditions, once activated by brain‐derived neurotrophic factor (BDNF) and neurotrophin‐4/5 (NT‐4/5), the TrkB receptor promotes neuronal survival, differentiation and synaptic function. Considering that the natural ligands for TrkB receptor are large proteins, it is a challenge to discover small molecule capable to mimic their effects.Even though, the surface of receptor that is interacting with BDNF or NT‐4/5 is known, there was always a question which pocket and interaction is responsible for activation of it. In order to answer this challenging question, we have used molecular dynamic (MD) simulations and Pocketron algorithm which enabled us to detect, for the first time, a pocket network existing in the interacting domain (d5) of the receptor; to describe them and to see how they are communicating with each other. This new discovery gave us potential new areas on receptor that can be targeted and used for structure‐based drug design approach in the development of the new ligands.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic generation of functional peptides with desired bioactivity and membrane permeability using Bayesian optimization. 利用贝叶斯优化技术自动生成具有所需生物活性和膜渗透性的功能肽。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-04-01 Epub Date: 2024-02-19 DOI: 10.1002/minf.202300148
Itsuki Fukunaga, Yuki Matsukiyo, Kazuma Kaitoh, Yoshihiro Yamanishi

Peptides are potentially useful modalities of drugs; however, cell membrane permeability is an obstacle in peptide drug discovery. The identification of bioactive peptides for a therapeutic target is also challenging because of the huge amino acid sequence patterns of peptides. In this study, we propose a novel computational method, PEptide generation system using Neural network Trained on Amino acid sequence data and Gaussian process-based optimizatiON (PENTAGON), to automatically generate new peptides with desired bioactivity and cell membrane permeability. In the algorithm, we mapped peptide amino acid sequences onto the latent space constructed using a variational autoencoder and searched for peptides with desired bioactivity and cell membrane permeability using Bayesian optimization. We used our proposed method to generate peptides with cell membrane permeability and bioactivity for each of the nine therapeutic targets, such as the estrogen receptor (ER). Our proposed method outperformed a previously developed peptide generator in terms of similarity to known active peptide sequences and the length of generated peptide sequences.

肽是一种潜在的有用药物;然而,细胞膜渗透性是肽药物发现的一个障碍。由于肽的氨基酸序列模式庞大,识别治疗靶点的生物活性肽也具有挑战性。在这项研究中,我们提出了一种新的计算方法--基于氨基酸序列数据和高斯过程优化训练的神经网络多肽生成系统(PENTAGON),用于自动生成具有所需生物活性和细胞膜渗透性的新多肽。在该算法中,我们将多肽氨基酸序列映射到使用变异自动编码器构建的潜空间上,并使用贝叶斯优化法搜索具有所需生物活性和细胞膜渗透性的多肽。我们使用所提出的方法为雌激素受体(ER)等九个治疗靶点生成了具有细胞膜渗透性和生物活性的多肽。就与已知活性肽序列的相似性和生成肽序列的长度而言,我们提出的方法优于之前开发的肽生成器。
{"title":"Automatic generation of functional peptides with desired bioactivity and membrane permeability using Bayesian optimization.","authors":"Itsuki Fukunaga, Yuki Matsukiyo, Kazuma Kaitoh, Yoshihiro Yamanishi","doi":"10.1002/minf.202300148","DOIUrl":"10.1002/minf.202300148","url":null,"abstract":"<p><p>Peptides are potentially useful modalities of drugs; however, cell membrane permeability is an obstacle in peptide drug discovery. The identification of bioactive peptides for a therapeutic target is also challenging because of the huge amino acid sequence patterns of peptides. In this study, we propose a novel computational method, PEptide generation system using Neural network Trained on Amino acid sequence data and Gaussian process-based optimizatiON (PENTAGON), to automatically generate new peptides with desired bioactivity and cell membrane permeability. In the algorithm, we mapped peptide amino acid sequences onto the latent space constructed using a variational autoencoder and searched for peptides with desired bioactivity and cell membrane permeability using Bayesian optimization. We used our proposed method to generate peptides with cell membrane permeability and bioactivity for each of the nine therapeutic targets, such as the estrogen receptor (ER). Our proposed method outperformed a previously developed peptide generator in terms of similarity to known active peptide sequences and the length of generated peptide sequences.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139106312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors. 使用反应载体进行可合成的从头设计:应用于 PARP1 抑制剂。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-04-01 Epub Date: 2024-02-06 DOI: 10.1002/minf.202300183
Gian Marco Ghiandoni, Stuart R Flanagan, Michael J Bodkin, Maria Giulia Nizi, Albert Galera-Prat, Annalaura Brai, Beining Chen, James E A Wallace, Dimitar Hristozov, James Webster, Giuseppe Manfroni, Lari Lehtiö, Oriana Tabarrini, Valerie J Gillet

De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.

多年来,从头设计一直是一个热门话题。最近的发展涉及使用深度学习方法进行生成分子设计。尽管算法越来越复杂,但设计可合成的分子仍然是一大挑战。基于反应的从头设计采用了一种概念上更简单的方法,旨在通过模仿合成化学,以逐步应用已知反应来驱动结构转变,从而直接解决可合成性问题。然而,使用少量手工编码的转换限制了可访问的化学空间,文献中很少有成功设计并执行分子及其合成路线的实例。在此,我们介绍了将基于反应的从头设计应用于设计可合成且具有生物活性的化合物,作为我们基于反应载体的软件的概念验证。反应载体是从已知反应中自动衍生出来的,可以进入合成可及化学空间的广泛区域。设计的目的是生产出对 PARP1 有活性的分子,与现有的 PARP1 抑制剂相比,这些分子具有更好的脑穿透特性。我们根据提供的合成路线合成了部分设计分子,并对其进行了实验测试。结果表明,反应载体可用于设计具有生物学意义的新型分子,而且这些分子在合成上也是可获得的。
{"title":"Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors.","authors":"Gian Marco Ghiandoni, Stuart R Flanagan, Michael J Bodkin, Maria Giulia Nizi, Albert Galera-Prat, Annalaura Brai, Beining Chen, James E A Wallace, Dimitar Hristozov, James Webster, Giuseppe Manfroni, Lari Lehtiö, Oriana Tabarrini, Valerie J Gillet","doi":"10.1002/minf.202300183","DOIUrl":"10.1002/minf.202300183","url":null,"abstract":"<p><p>De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139521506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values. 基于集合的蛋白质配体结合亲和力预测值置信度估算方法。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-04-01 Epub Date: 2024-02-15 DOI: 10.1002/minf.202300292
Milad Rayka, Morteza Mirzaei, Ali Mohammad Latifi

When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.

在设计基于机器学习的评分函数时,我们只能获得有限数量的具有实验确定的结合亲和力值的蛋白质配体复合物,而这些复合物仅代表所有可能的蛋白质配体复合物的一小部分。因此,在测试期间报告模型预测的置信度和量化不确定性至关重要。在此,我们采用保形预测技术来评估 CASF 2016 基准核心集每个成员的预测置信度。共形预测技术需要多样化的预测集合来进行不确定性估计。为此,我们引入了 ENS-Score 作为集合预测器,其中包括 30 个采用不同蛋白质配体表示方法的模型,并在 CASF 2016 基准的核心集上实现了 0.842 的皮尔逊相关性。此外,我们还全面研究了每个数据点的残余误差,以评估残余误差分布的正态性及其与配体结构特征(如疏水相互作用和卤素键)的相关性。最后,我们提供了一个本地主机网络应用程序,以方便使用 ENS-Score。重复结果的所有代码均可在 https://github.com/miladrayka/ENS_Score 网站上找到。
{"title":"An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values.","authors":"Milad Rayka, Morteza Mirzaei, Ali Mohammad Latifi","doi":"10.1002/minf.202300292","DOIUrl":"10.1002/minf.202300292","url":null,"abstract":"<p><p>When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139735655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs). 将基于机器学习的读取交叉结构-性质关系(RASPR)作为预测建模的新工具:预测染料敏化太阳能电池(DSSC)中某些类别有机染料的功率转换效率(PCE)。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-04-01 Epub Date: 2024-02-19 DOI: 10.1002/minf.202300210
Souvik Pore, Arkaprava Banerjee, Kunal Roy

The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.

在预测材料的各种特性时,应用各种基于硅的方法已成为实验方法的有效替代方法。最近,定量结构-性能关系(QSPR)和读取交叉(RA)方法的概念被融合在一起,开发出一种新兴的化学信息工具:读取交叉结构-性能关系(RASPR)。由于 RASPR 方法采用了各种基于相似性和误差的测量方法,因此既适用于大型数据集,也适用于小型数据集。另据观察,与相应的 QSPR 模型相比,RASPR 模型往往具有更强的外部预测能力。在本研究中,我们使用定量 RASPR(q-RASPR)方法对染料敏化太阳能电池(DSSC)中使用的有机染料的功率转换效率(PCE)进行了建模。我们使用了相对较大的有机染料类别--吩噻嗪类(n=207)、卟啉类(n=281)和三苯胺类(n=229)进行建模。我们将每个数据集分为三个不同组合的训练集和测试集,并利用训练集开发了三个不同的 QSPR 模型,其中包含结构和物理化学描述符,并用相应的测试集对其进行了验证。使用基于 Java 的工具 RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home),这些相应的模型描述符被用来计算 RASPR 描述符,然后通过将之前选择的结构和理化描述符与计算出的 RASPR 描述符汇集在一起进行数据融合。我们还采用了进一步的特征选择算法来建立最终的 RASPR PLS 模型。在此,我们还利用在 QSPR PLS 模型和 RASPR PLS 模型中选择的描述符开发了不同的机器学习(ML)模型,结果发现,使用 RASPR 描述符的模型在外部预测性方面优于仅使用结构和理化描述符的模型:吩噻嗪类化合物的 RMSEP 从 1.16-1.25 降至 1.07-1.18,卟啉类化合物的 RMSEP 从 1.60-1.79 降至 1.45-1.53,三苯胺类化合物的 RMSEP 从 1.27-1.54 降至 1.20-1.47。
{"title":"Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs).","authors":"Souvik Pore, Arkaprava Banerjee, Kunal Roy","doi":"10.1002/minf.202300210","DOIUrl":"10.1002/minf.202300210","url":null,"abstract":"<p><p>The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139906082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cover Picture: (Mol. Inf. 3/2024) 封面图片:(Mol.Inf. 3/2024)
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-03-21 DOI: 10.1002/minf.202480301
{"title":"Cover Picture: (Mol. Inf. 3/2024)","authors":"","doi":"10.1002/minf.202480301","DOIUrl":"https://doi.org/10.1002/minf.202480301","url":null,"abstract":"","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140198003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring data-driven chemical SMILES tokenization approaches to identify key protein-ligand binding moieties. 探索数据驱动的化学 SMILES 标记化方法,以确定关键的蛋白质配体结合分子。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-03-01 Epub Date: 2024-01-23 DOI: 10.1002/minf.202300249
Asu Busra Temizer, Gökçe Uludoğan, Rıza Özçelik, Taha Koulani, Elif Ozkirimli, Kutlu O Ulgen, Nilgun Karali, Arzucan Özgür

Machine learning models have found numerous successful applications in computational drug discovery. A large body of these models represents molecules as sequences since molecular sequences are easily available, simple, and informative. The sequence-based models often segment molecular sequences into pieces called chemical words, analogous to the words that make up sentences in human languages, and then apply advanced natural language processing techniques for tasks such as de novo drug design, property prediction, and binding affinity prediction. However, the chemical characteristics and significance of these building blocks, chemical words, remain unexplored. To address this gap, we employ data-driven SMILES tokenization techniques such as Byte Pair Encoding, WordPiece, and Unigram to identify chemical words and compare the resulting vocabularies. To understand the chemical significance of these words, we build a language-inspired pipeline that treats high affinity ligands of protein targets as documents and selects key chemical words making up those ligands based on tf-idf weighting. The experiments on multiple protein-ligand affinity datasets show that despite differences in words, lengths, and validity among the vocabularies generated by different subword tokenization algorithms, the identified key chemical words exhibit similarity. Further, we conduct case studies on a number of target to analyze the impact of key chemical words on binding. We find that these key chemical words are specific to protein targets and correspond to known pharmacophores and functional groups. Our approach elucidates chemical properties of the words identified by machine learning models and can be used in drug discovery studies to determine significant chemical moieties.

机器学习模型在计算药物发现中得到了大量成功应用。由于分子序列容易获得、简单且信息量大,因此大量此类模型将分子表示为序列。基于序列的模型通常将分子序列分割成称为化学词的片段(类似于人类语言中组成句子的单词),然后应用先进的自然语言处理技术来完成新药设计、性质预测和结合亲和力预测等任务。然而,这些构件(化学词语)的化学特征和意义仍未得到探索。为了填补这一空白,我们采用了数据驱动的 SMILES 标记化技术,如字节对编码、WordPiece 和 Unigram,来识别化学词并比较由此产生的词汇表。为了理解这些词的化学意义,我们建立了一个语言启发管道,将蛋白质靶标的高亲和性配体视为文档,并根据 tf-idf 加权选择构成这些配体的关键化学词。在多个蛋白质配体亲和性数据集上的实验表明,尽管不同的子词标记化算法生成的词汇表在字数、长度和有效性上存在差异,但识别出的关键化学词却表现出了相似性。此外,我们还对一些目标物进行了案例研究,以分析关键化学词对结合的影响。我们发现,这些关键化学词对蛋白质靶标具有特异性,并与已知的药理和功能基团相对应。我们的方法阐明了机器学习模型识别出的单词的化学特性,可用于药物发现研究,以确定重要的化学分子。
{"title":"Exploring data-driven chemical SMILES tokenization approaches to identify key protein-ligand binding moieties.","authors":"Asu Busra Temizer, Gökçe Uludoğan, Rıza Özçelik, Taha Koulani, Elif Ozkirimli, Kutlu O Ulgen, Nilgun Karali, Arzucan Özgür","doi":"10.1002/minf.202300249","DOIUrl":"10.1002/minf.202300249","url":null,"abstract":"<p><p>Machine learning models have found numerous successful applications in computational drug discovery. A large body of these models represents molecules as sequences since molecular sequences are easily available, simple, and informative. The sequence-based models often segment molecular sequences into pieces called chemical words, analogous to the words that make up sentences in human languages, and then apply advanced natural language processing techniques for tasks such as de novo drug design, property prediction, and binding affinity prediction. However, the chemical characteristics and significance of these building blocks, chemical words, remain unexplored. To address this gap, we employ data-driven SMILES tokenization techniques such as Byte Pair Encoding, WordPiece, and Unigram to identify chemical words and compare the resulting vocabularies. To understand the chemical significance of these words, we build a language-inspired pipeline that treats high affinity ligands of protein targets as documents and selects key chemical words making up those ligands based on tf-idf weighting. The experiments on multiple protein-ligand affinity datasets show that despite differences in words, lengths, and validity among the vocabularies generated by different subword tokenization algorithms, the identified key chemical words exhibit similarity. Further, we conduct case studies on a number of target to analyze the impact of key chemical words on binding. We find that these key chemical words are specific to protein targets and correspond to known pharmacophores and functional groups. Our approach elucidates chemical properties of the words identified by machine learning models and can be used in drug discovery studies to determine significant chemical moieties.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139403684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico construction of a focused fragment library facilitating exploration of chemical space. 硅构建聚焦片段库,促进化学空间探索。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2024-03-01 Epub Date: 2024-01-23 DOI: 10.1002/minf.202300256
Weijie Han, Xiaohe Xu, Qing Fan, Yingchao Yan, YanMin Zhang, Yadong Chen, Haichun Liu

Fragment-based drug design (FBDD) has emerged as a captivating subject in the realm of computer-aided drug design, enabling the generation of novel molecules through the rearrangement of ring systems within known compounds. The construction of focused fragment library plays a pivotal role in FBDD, necessitating the compilation of all potential bioactive ring systems capable of interacting with a specific target. In our study, we propose a workflow for the development of a focused fragment library and combinatorial compound library. The fragment library comprises seed fragments and collected fragments. The extraction of seed fragments is guided by receptor information, serving as a prerequisite for establishing a focused libraries. Conversely, collected fragments are obtained using the feature graph method, which offers a simplified representation of fragments and strikes a balance between diversity and similarity when categorizing different fragments. The utilization of feature graph facilitates the rational partitioning of chemical space at fragment level, enabling the exploration of desired chemical space and enhancing the efficiency of screening compound library. Analysis demonstrates that our workflow enables the enumeration of a greater number of entirely new potential compounds, thereby aiding in the rational design of drugs.

基于片段的药物设计(FBDD)已成为计算机辅助药物设计领域一个令人着迷的课题,它通过对已知化合物中的环系统进行重排,生成新的分子。集中片段库的构建在 FBDD 中起着关键作用,它要求汇集所有能与特定靶点相互作用的潜在生物活性环系统。在我们的研究中,我们提出了一个开发聚焦片段库和组合化合物库的工作流程。片段库包括种子片段和收集片段。种子片段的提取以受体信息为指导,是建立重点片段库的先决条件。相反,收集的片段是通过特征图法获得的,这种方法简化了片段的表示,并在对不同片段进行分类时兼顾了多样性和相似性。特征图的使用有助于在片段层面合理划分化学空间,从而探索所需的化学空间,提高化合物库筛选的效率。分析表明,我们的工作流程能够枚举出更多全新的潜在化合物,从而有助于药物的合理设计。
{"title":"In silico construction of a focused fragment library facilitating exploration of chemical space.","authors":"Weijie Han, Xiaohe Xu, Qing Fan, Yingchao Yan, YanMin Zhang, Yadong Chen, Haichun Liu","doi":"10.1002/minf.202300256","DOIUrl":"10.1002/minf.202300256","url":null,"abstract":"<p><p>Fragment-based drug design (FBDD) has emerged as a captivating subject in the realm of computer-aided drug design, enabling the generation of novel molecules through the rearrangement of ring systems within known compounds. The construction of focused fragment library plays a pivotal role in FBDD, necessitating the compilation of all potential bioactive ring systems capable of interacting with a specific target. In our study, we propose a workflow for the development of a focused fragment library and combinatorial compound library. The fragment library comprises seed fragments and collected fragments. The extraction of seed fragments is guided by receptor information, serving as a prerequisite for establishing a focused libraries. Conversely, collected fragments are obtained using the feature graph method, which offers a simplified representation of fragments and strikes a balance between diversity and similarity when categorizing different fragments. The utilization of feature graph facilitates the rational partitioning of chemical space at fragment level, enabling the exploration of desired chemical space and enhancing the efficiency of screening compound library. Analysis demonstrates that our workflow enables the enumeration of a greater number of entirely new potential compounds, thereby aiding in the rational design of drugs.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139403685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1