Binding Activity Classification of Anti-SARS-CoV-2 Molecules using Deep Learning Across Multiple Assays

IF 3.8 4区医学 Q2 MEDICINE, GENERAL & INTERNAL Balkan Medical Journal Pub Date : 2024-05-03 Epub Date: 2024-03-11 DOI:10.4274/balkanmedj.galenos.2024.2024-1-73

Bilge Eren Yamasan, Selçuk Korkmaz

{"title":"Binding Activity Classification of Anti-SARS-CoV-2 Molecules using Deep Learning Across Multiple Assays","authors":"Bilge Eren Yamasan, Selçuk Korkmaz","doi":"10.4274/balkanmedj.galenos.2024.2024-1-73","DOIUrl":null,"url":null,"abstract":"Background: The coronavirus disease-2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2), has urgently necessitated effective therapeutic solutions, with a focus on rapidly identifying and classifying potential small-molecule drugs. Given traditional methods’ labor-intensive and time-consuming nature, deep learning has emerged as an essential tool for efficiently processing and extracting insights from complex biological data.Aims: To utilize deep learning techniques, particularly deep neural networks (DNN) enhanced with the synthetic minority oversampling technique (SMOTE), to enhance the classification of binding activities in anti-SARS-CoV-2 molecules across various bioassays.Methods: We used 11 bioassay datasets covering various SARS-CoV-2 interactions and inhibitory mechanisms. These assays ranged from spike-ACE2 protein-protein interaction to ACE2 enzymatic activity and 3CL enzymatic activity. To address the prevalent class imbalance in these datasets, the SMOTE technique was employed to generate new samples for the minority class. In our model-building approach, we divided the dataset into 80% training and 20% test sets, reserving 10% of the training set for validation. Our approach involved employing a DNN that integrates ReLU and sigmoid activation functions, incorporates batch normalization, and uses Adam optimization. The hyperparameters and architecture of the DNN were optimized through various tests on layers, minibatch sizes, epoch sizes, and learning rates. A 40% dropout rate was incorporated to mitigate overfitting. For model evaluation, we computed performance metrics, such as balanced accuracy (BACC), precision, recall, F1 score, Matthews’ correlation coefficient (MCC), and area under the curve (AUC).Results: The performance of the DNN across 11 bioassay test sets revealed varying outcomes, significantly influenced by the ratios of active-to-inactive compounds. Assays, such as AlphaLISA and CoV-PPE, demonstrated robust performance across various metrics, including BACC, precision, recall, and AUC, when configured with more balanced ratios (1:3 and 1:1, respectively). This suggests the effective identification of active compounds in both cases. In contrast, assays with higher imbalance ratios, such as 3CL (1:38) and cytopathic effect (1:15), demonstrated higher recall but lower precision, highlighting challenges in accurately identifying active compounds among numerous inactive compounds. However, even in these challenging settings, the model achieved favorable BACC and recall scores. Overall, the DNN model generally performed well, as indicated by the BACC, MCC, and AUC values, especially when considering the degree of dataset imbalance in each assay.Conclusion: This study demonstrates the significant impact of deep learning, particularly DNN models enhanced with SMOTE, in improving the identification of active compounds in bioassay datasets for COVID-19 drug discovery, outperforming traditional machine learning models. Furthermore, this study highlights the efficacy of advanced computational techniques in addressing high-throughput screening data imbalances.","PeriodicalId":8690,"journal":{"name":"Balkan Medical Journal","volume":" ","pages":"186-192"},"PeriodicalIF":3.8000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11077922/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Balkan Medical Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4274/balkanmedj.galenos.2024.2024-1-73","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The coronavirus disease-2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2), has urgently necessitated effective therapeutic solutions, with a focus on rapidly identifying and classifying potential small-molecule drugs. Given traditional methods’ labor-intensive and time-consuming nature, deep learning has emerged as an essential tool for efficiently processing and extracting insights from complex biological data.

Aims: To utilize deep learning techniques, particularly deep neural networks (DNN) enhanced with the synthetic minority oversampling technique (SMOTE), to enhance the classification of binding activities in anti-SARS-CoV-2 molecules across various bioassays.

Methods: We used 11 bioassay datasets covering various SARS-CoV-2 interactions and inhibitory mechanisms. These assays ranged from spike-ACE2 protein-protein interaction to ACE2 enzymatic activity and 3CL enzymatic activity. To address the prevalent class imbalance in these datasets, the SMOTE technique was employed to generate new samples for the minority class. In our model-building approach, we divided the dataset into 80% training and 20% test sets, reserving 10% of the training set for validation. Our approach involved employing a DNN that integrates ReLU and sigmoid activation functions, incorporates batch normalization, and uses Adam optimization. The hyperparameters and architecture of the DNN were optimized through various tests on layers, minibatch sizes, epoch sizes, and learning rates. A 40% dropout rate was incorporated to mitigate overfitting. For model evaluation, we computed performance metrics, such as balanced accuracy (BACC), precision, recall, F1 score, Matthews’ correlation coefficient (MCC), and area under the curve (AUC).

Results: The performance of the DNN across 11 bioassay test sets revealed varying outcomes, significantly influenced by the ratios of active-to-inactive compounds. Assays, such as AlphaLISA and CoV-PPE, demonstrated robust performance across various metrics, including BACC, precision, recall, and AUC, when configured with more balanced ratios (1:3 and 1:1, respectively). This suggests the effective identification of active compounds in both cases. In contrast, assays with higher imbalance ratios, such as 3CL (1:38) and cytopathic effect (1:15), demonstrated higher recall but lower precision, highlighting challenges in accurately identifying active compounds among numerous inactive compounds. However, even in these challenging settings, the model achieved favorable BACC and recall scores. Overall, the DNN model generally performed well, as indicated by the BACC, MCC, and AUC values, especially when considering the degree of dataset imbalance in each assay.

Conclusion: This study demonstrates the significant impact of deep learning, particularly DNN models enhanced with SMOTE, in improving the identification of active compounds in bioassay datasets for COVID-19 drug discovery, outperforming traditional machine learning models. Furthermore, this study highlights the efficacy of advanced computational techniques in addressing high-throughput screening data imbalances.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用深度学习对多种检测方法中的抗 SARS-CoV-2 分子进行结合活性分类。

背景：由严重急性呼吸系统综合征-冠状病毒-2（SARS-CoV-2）引起的冠状病毒病-2019（COVID-19）大流行迫切需要有效的治疗解决方案，重点是快速识别和分类潜在的小分子药物。目的：本研究旨在利用深度学习技术，特别是利用合成少数超采样技术（SMOTE）增强的深度神经网络（DNN），来提高抗SARS-CoV-2分子在各种生物测定中结合活性的分类能力：我们使用了 11 个生物测定数据集，涵盖了各种 SARS-CoV-2 相互作用和抑制机制。这些测定包括尖峰-ACE2 蛋白-蛋白相互作用、ACE2 酶活性和 3CL 酶活性。为了解决这些数据集中普遍存在的类不平衡问题，我们采用了 SMOTE 技术为少数类生成新样本。在建立模型的方法中，我们将数据集分为 80% 的训练集和 20% 的测试集，并保留 10% 的训练集用于验证。我们采用的 DNN 方法整合了 ReLU 和 sigmoid 激活函数、批量归一化和亚当优化。通过对层数、最小批量大小、历时大小和学习率进行各种测试，对 DNN 的超参数和架构进行了优化。我们还加入了 40% 的辍学率，以减少过度拟合。为了对模型进行评估，我们计算了性能指标，如平衡准确率（BACC）、精确度、召回率、F1 分数、马修斯相关系数（MCC）和曲线下面积（AUC）：结果：DNN 在 11 个生物测定测试集中的表现显示出不同的结果，其中活性与非活性化合物的比例对其影响很大。AlphaLISA 和 CoV-PPE 等检测方法在配置更均衡的比例（分别为 1:3 和 1:1）时，在 BACC、精确度、召回率和 AUC 等各种指标上都表现出强劲的性能。这表明在这两种情况下都能有效识别活性化合物。相比之下，3CL（1:38）和细胞病理效应（1:15）等不平衡比率较高的检测方法的召回率较高，但精确度较低，这凸显了在众多非活性化合物中准确识别活性化合物所面临的挑战。不过，即使在这些具有挑战性的环境中，该模型也取得了良好的 BACC 和召回分数。总体而言，正如 BACC、MCC 和 AUC 值所显示的那样，DNN 模型总体表现良好，尤其是考虑到每个检测中数据集的不平衡程度时：本研究表明，深度学习，特别是使用 SMOTE 增强的 DNN 模型，在改进 COVID-19 药物发现的生物测定数据集中活性化合物的鉴定方面具有重大影响，其表现优于传统的机器学习模型。此外，本研究还凸显了先进计算技术在解决高通量筛选数据不平衡方面的功效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Balkan Medical Journal MEDICINE, GENERAL & INTERNAL-

CiteScore

4.10

自引率

6.70%

发文量

审稿时长

6-12 weeks

期刊介绍： The Balkan Medical Journal (Balkan Med J) is a peer-reviewed open-access international journal that publishes interesting clinical and experimental research conducted in all fields of medicine, interesting case reports and clinical images, invited reviews, editorials, letters, comments and letters to the Editor including reports on publication and research ethics. The journal is the official scientific publication of the Trakya University Faculty of Medicine, Edirne, Turkey and is printed six times a year, in January, March, May, July, September and November. The language of the journal is English. The journal is based on independent and unbiased double-blinded peer-reviewed principles. Only unpublished papers that are not under review for publication elsewhere can be submitted. Balkan Medical Journal does not accept multiple submission and duplicate submission even though the previous one was published in a different language. The authors are responsible for the scientific content of the material to be published. The Balkan Medical Journal reserves the right to request any research materials on which the paper is based. The Balkan Medical Journal encourages and enables academicians, researchers, specialists and primary care physicians of Balkan countries to publish their valuable research in all branches of medicine. The primary aim of the journal is to publish original articles with high scientific and ethical quality and serve as a good example of medical publications in the Balkans as well as in the World.