SPOT: A machine learning model that predicts specific substrates for transport proteins.

IF 9.8 1区生物学 Q1 Agricultural and Biological Sciences PLoS Biology Pub Date : 2024-09-26 eCollection Date: 2024-09-01 DOI:10.1371/journal.pbio.3002807

Alexander Kroll, Nico Niebuhr, Gregory Butler, Martin J Lercher

{"title":"SPOT: A machine learning model that predicts specific substrates for transport proteins.","authors":"Alexander Kroll, Nico Niebuhr, Gregory Butler, Martin J Lercher","doi":"10.1371/journal.pbio.3002807","DOIUrl":null,"url":null,"abstract":"<p><p>Transport proteins play a crucial role in cellular metabolism and are central to many aspects of molecular biology and medicine. Determining the function of transport proteins experimentally is challenging, as they become unstable when isolated from cell membranes. Machine learning-based predictions could provide an efficient alternative. However, existing methods are limited to predicting a small number of specific substrates or broad transporter classes. These limitations stem partly from using small data sets for model training and a choice of input features that lack sufficient information about the prediction problem. Here, we present SPOT, the first general machine learning model that can successfully predict specific substrates for arbitrary transport proteins, achieving an accuracy above 92% on independent and diverse test data covering widely different transporters and a broad range of metabolites. SPOT uses Transformer Networks to represent transporters and substrates numerically. To overcome the problem of missing negative data for training, it augments a large data set of known transporter-substrate pairs with carefully sampled random molecules as non-substrates. SPOT not only predicts specific transporter-substrate pairs, but also outperforms previously published models designed to predict broad substrate classes for individual transport proteins. We provide a web server and Python function that allows users to explore the substrate scope of arbitrary transporters.</p>","PeriodicalId":49001,"journal":{"name":"PLoS Biology","volume":"22 9","pages":"e3002807"},"PeriodicalIF":9.8000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426516/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pbio.3002807","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

Transport proteins play a crucial role in cellular metabolism and are central to many aspects of molecular biology and medicine. Determining the function of transport proteins experimentally is challenging, as they become unstable when isolated from cell membranes. Machine learning-based predictions could provide an efficient alternative. However, existing methods are limited to predicting a small number of specific substrates or broad transporter classes. These limitations stem partly from using small data sets for model training and a choice of input features that lack sufficient information about the prediction problem. Here, we present SPOT, the first general machine learning model that can successfully predict specific substrates for arbitrary transport proteins, achieving an accuracy above 92% on independent and diverse test data covering widely different transporters and a broad range of metabolites. SPOT uses Transformer Networks to represent transporters and substrates numerically. To overcome the problem of missing negative data for training, it augments a large data set of known transporter-substrate pairs with carefully sampled random molecules as non-substrates. SPOT not only predicts specific transporter-substrate pairs, but also outperforms previously published models designed to predict broad substrate classes for individual transport proteins. We provide a web server and Python function that allows users to explore the substrate scope of arbitrary transporters.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SPOT：预测转运蛋白特定底物的机器学习模型。

转运蛋白在细胞新陈代谢中起着至关重要的作用，是分子生物学和医学许多方面的核心。通过实验确定转运蛋白的功能具有挑战性，因为当它们从细胞膜中分离出来时会变得不稳定。基于机器学习的预测可以提供一种有效的替代方法。然而，现有方法仅限于预测少量特定底物或广泛的转运体类别。这些局限性部分源于使用较小的数据集进行模型训练，以及对输入特征的选择缺乏有关预测问题的足够信息。在这里，我们提出了 SPOT，这是第一个能成功预测任意转运蛋白特定底物的通用机器学习模型，在独立和多样化的测试数据上达到了 92% 以上的准确率，这些数据涵盖了多种不同的转运体和广泛的代谢物。SPOT 使用变形网络（Transformer Networks）对转运体和底物进行数字表示。为了克服训练中缺失负数据的问题，SPOT 利用精心采样的随机分子作为非底物，增强了已知转运体-底物对的大型数据集。SPOT 不仅能预测特定的转运体-底物配对，其性能也优于之前发表的旨在预测单个转运蛋白广泛底物类别的模型。我们提供了一个网络服务器和 Python 函数，允许用户探索任意转运体的底物范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PLoS Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-BIOLOGY

CiteScore

15.40

自引率

2.00%

发文量

359

审稿时长

3-8 weeks

期刊介绍： PLOS Biology is the flagship journal of the Public Library of Science (PLOS) and focuses on publishing groundbreaking and relevant research in all areas of biological science. The journal features works at various scales, ranging from molecules to ecosystems, and also encourages interdisciplinary studies. PLOS Biology publishes articles that demonstrate exceptional significance, originality, and relevance, with a high standard of scientific rigor in methodology, reporting, and conclusions. The journal aims to advance science and serve the research community by transforming research communication to align with the research process. It offers evolving article types and policies that empower authors to share the complete story behind their scientific findings with a diverse global audience of researchers, educators, policymakers, patient advocacy groups, and the general public. PLOS Biology, along with other PLOS journals, is widely indexed by major services such as Crossref, Dimensions, DOAJ, Google Scholar, PubMed, PubMed Central, Scopus, and Web of Science. Additionally, PLOS Biology is indexed by various other services including AGRICOLA, Biological Abstracts, BIOSYS Previews, CABI CAB Abstracts, CABI Global Health, CAPES, CAS, CNKI, Embase, Journal Guide, MEDLINE, and Zoological Record, ensuring that the research content is easily accessible and discoverable by a wide range of audiences.