SPOT: A machine learning model that predicts specific substrates for transport proteins.

IF 9.8 1区 生物学 Q1 Agricultural and Biological Sciences PLoS Biology Pub Date : 2024-09-26 eCollection Date: 2024-09-01 DOI:10.1371/journal.pbio.3002807
Alexander Kroll, Nico Niebuhr, Gregory Butler, Martin J Lercher
{"title":"SPOT: A machine learning model that predicts specific substrates for transport proteins.","authors":"Alexander Kroll, Nico Niebuhr, Gregory Butler, Martin J Lercher","doi":"10.1371/journal.pbio.3002807","DOIUrl":null,"url":null,"abstract":"<p><p>Transport proteins play a crucial role in cellular metabolism and are central to many aspects of molecular biology and medicine. Determining the function of transport proteins experimentally is challenging, as they become unstable when isolated from cell membranes. Machine learning-based predictions could provide an efficient alternative. However, existing methods are limited to predicting a small number of specific substrates or broad transporter classes. These limitations stem partly from using small data sets for model training and a choice of input features that lack sufficient information about the prediction problem. Here, we present SPOT, the first general machine learning model that can successfully predict specific substrates for arbitrary transport proteins, achieving an accuracy above 92% on independent and diverse test data covering widely different transporters and a broad range of metabolites. SPOT uses Transformer Networks to represent transporters and substrates numerically. To overcome the problem of missing negative data for training, it augments a large data set of known transporter-substrate pairs with carefully sampled random molecules as non-substrates. SPOT not only predicts specific transporter-substrate pairs, but also outperforms previously published models designed to predict broad substrate classes for individual transport proteins. We provide a web server and Python function that allows users to explore the substrate scope of arbitrary transporters.</p>","PeriodicalId":49001,"journal":{"name":"PLoS Biology","volume":null,"pages":null},"PeriodicalIF":9.8000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426516/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pbio.3002807","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Transport proteins play a crucial role in cellular metabolism and are central to many aspects of molecular biology and medicine. Determining the function of transport proteins experimentally is challenging, as they become unstable when isolated from cell membranes. Machine learning-based predictions could provide an efficient alternative. However, existing methods are limited to predicting a small number of specific substrates or broad transporter classes. These limitations stem partly from using small data sets for model training and a choice of input features that lack sufficient information about the prediction problem. Here, we present SPOT, the first general machine learning model that can successfully predict specific substrates for arbitrary transport proteins, achieving an accuracy above 92% on independent and diverse test data covering widely different transporters and a broad range of metabolites. SPOT uses Transformer Networks to represent transporters and substrates numerically. To overcome the problem of missing negative data for training, it augments a large data set of known transporter-substrate pairs with carefully sampled random molecules as non-substrates. SPOT not only predicts specific transporter-substrate pairs, but also outperforms previously published models designed to predict broad substrate classes for individual transport proteins. We provide a web server and Python function that allows users to explore the substrate scope of arbitrary transporters.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SPOT:预测转运蛋白特定底物的机器学习模型。
转运蛋白在细胞新陈代谢中起着至关重要的作用,是分子生物学和医学许多方面的核心。通过实验确定转运蛋白的功能具有挑战性,因为当它们从细胞膜中分离出来时会变得不稳定。基于机器学习的预测可以提供一种有效的替代方法。然而,现有方法仅限于预测少量特定底物或广泛的转运体类别。这些局限性部分源于使用较小的数据集进行模型训练,以及对输入特征的选择缺乏有关预测问题的足够信息。在这里,我们提出了 SPOT,这是第一个能成功预测任意转运蛋白特定底物的通用机器学习模型,在独立和多样化的测试数据上达到了 92% 以上的准确率,这些数据涵盖了多种不同的转运体和广泛的代谢物。SPOT 使用变形网络(Transformer Networks)对转运体和底物进行数字表示。为了克服训练中缺失负数据的问题,SPOT 利用精心采样的随机分子作为非底物,增强了已知转运体-底物对的大型数据集。SPOT 不仅能预测特定的转运体-底物配对,其性能也优于之前发表的旨在预测单个转运蛋白广泛底物类别的模型。我们提供了一个网络服务器和 Python 函数,允许用户探索任意转运体的底物范围。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
PLoS Biology
PLoS Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-BIOLOGY
CiteScore
15.40
自引率
2.00%
发文量
359
审稿时长
3-8 weeks
期刊介绍: PLOS Biology is the flagship journal of the Public Library of Science (PLOS) and focuses on publishing groundbreaking and relevant research in all areas of biological science. The journal features works at various scales, ranging from molecules to ecosystems, and also encourages interdisciplinary studies. PLOS Biology publishes articles that demonstrate exceptional significance, originality, and relevance, with a high standard of scientific rigor in methodology, reporting, and conclusions. The journal aims to advance science and serve the research community by transforming research communication to align with the research process. It offers evolving article types and policies that empower authors to share the complete story behind their scientific findings with a diverse global audience of researchers, educators, policymakers, patient advocacy groups, and the general public. PLOS Biology, along with other PLOS journals, is widely indexed by major services such as Crossref, Dimensions, DOAJ, Google Scholar, PubMed, PubMed Central, Scopus, and Web of Science. Additionally, PLOS Biology is indexed by various other services including AGRICOLA, Biological Abstracts, BIOSYS Previews, CABI CAB Abstracts, CABI Global Health, CAPES, CAS, CNKI, Embase, Journal Guide, MEDLINE, and Zoological Record, ensuring that the research content is easily accessible and discoverable by a wide range of audiences.
期刊最新文献
Gather your neurons and model together: Community times ahead. Biomedical researchers' perspectives on the reproducibility of research. Community-based reconstruction and simulation of a full-scale model of the rat hippocampus CA1 region. Harnessing plant biosynthesis for the development of next-generation therapeutics. Transcriptomic analysis of the 12 major human breast cell types reveals mechanisms of cell and tissue function.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1