Decoding the impact of neighboring amino acids on ESI-MS intensity output through deep learning

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS ACS Applied Bio Materials Pub Date : 2024-09-26 DOI:10.1016/j.jprot.2024.105322
Naim Abdul-Khalek, Reinhard Wimmer, Michael Toft Overgaard, Simon Gregersen Echers
{"title":"Decoding the impact of neighboring amino acids on ESI-MS intensity output through deep learning","authors":"Naim Abdul-Khalek,&nbsp;Reinhard Wimmer,&nbsp;Michael Toft Overgaard,&nbsp;Simon Gregersen Echers","doi":"10.1016/j.jprot.2024.105322","DOIUrl":null,"url":null,"abstract":"<div><div>Peptide-level quantification using mass spectrometry (MS) is no trivial task as the physicochemical properties affect both response and detectability. The specific amino acid (AA) sequence affects these properties, however the connection between sequence and intensity output remains poorly understood. In this work, we explore combinations of amino acid pairs (i.e., dimer motifs) to determine a potential relationship between the local amino acid environment and MS1 intensity. For this purpose, a deep learning (DL) model, consisting of an encoder-decoder with an attention mechanism, was built. The attention mechanism allowed to identify the most relevant motifs. Specific patterns were consistently observed where a bulky/aromatic and hydrophobic AA followed by a cationic AA as well as consecutive bulky/aromatic and hydrophobic AAs were found important for the prediction of the MS1 intensity. Correlating attention weights to mean MS1 intensities revealed that some important motifs, particularly containing Trp, His, and Cys, were linked with low responding peptides whereas motifs containing Lys and most bulky hydrophobic AAs were often associated with high responding peptides. Moreover, Asn-Gly was associated with low response. The model predicts MS1 response with a mean average percentage error of ∼11 % and a Pearson correlation coefficient of ∼0.64. While dimer representation of peptide sequences did not improve predictive capacity compared to single AA representation in earlier work, this work adds valuable insight for a better understanding of peptide response in MS analysis.</div></div><div><h3>Significance</h3><div>Mass spectrometry is not inherently quantitative, and the response of a compound relies not only on its concentration but also on the molecular composition. For mass spectrometry-based analysis of peptides, such as in bottom-up proteomics, this directly implies that the response cannot be used directly to quantify individual peptides. Moreover, the dependency of the response on the amino acid sequence of individual peptides remains poorly understood. Using a deep learning model based on a recurrent neural network with an attention mechanism, we here investigate how the presence of dimer motifs within a peptide affects the MS1 response through the analysis of intended equimolar peptide pools comprising almost 200,000 unique peptides in total. Not only do we identify certain dimer classes and specific dimers that substantially affect the MS1 response, but the model is also able to predict peptide intensity with low error rates within the independent test subset. The findings not only improve our understanding of the link between sequence and response for peptides but also highlight the potential of utilizing deep learning for developing methods allowing for absolute, label-free peptide quantification.</div></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874391924002549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Peptide-level quantification using mass spectrometry (MS) is no trivial task as the physicochemical properties affect both response and detectability. The specific amino acid (AA) sequence affects these properties, however the connection between sequence and intensity output remains poorly understood. In this work, we explore combinations of amino acid pairs (i.e., dimer motifs) to determine a potential relationship between the local amino acid environment and MS1 intensity. For this purpose, a deep learning (DL) model, consisting of an encoder-decoder with an attention mechanism, was built. The attention mechanism allowed to identify the most relevant motifs. Specific patterns were consistently observed where a bulky/aromatic and hydrophobic AA followed by a cationic AA as well as consecutive bulky/aromatic and hydrophobic AAs were found important for the prediction of the MS1 intensity. Correlating attention weights to mean MS1 intensities revealed that some important motifs, particularly containing Trp, His, and Cys, were linked with low responding peptides whereas motifs containing Lys and most bulky hydrophobic AAs were often associated with high responding peptides. Moreover, Asn-Gly was associated with low response. The model predicts MS1 response with a mean average percentage error of ∼11 % and a Pearson correlation coefficient of ∼0.64. While dimer representation of peptide sequences did not improve predictive capacity compared to single AA representation in earlier work, this work adds valuable insight for a better understanding of peptide response in MS analysis.

Significance

Mass spectrometry is not inherently quantitative, and the response of a compound relies not only on its concentration but also on the molecular composition. For mass spectrometry-based analysis of peptides, such as in bottom-up proteomics, this directly implies that the response cannot be used directly to quantify individual peptides. Moreover, the dependency of the response on the amino acid sequence of individual peptides remains poorly understood. Using a deep learning model based on a recurrent neural network with an attention mechanism, we here investigate how the presence of dimer motifs within a peptide affects the MS1 response through the analysis of intended equimolar peptide pools comprising almost 200,000 unique peptides in total. Not only do we identify certain dimer classes and specific dimers that substantially affect the MS1 response, but the model is also able to predict peptide intensity with low error rates within the independent test subset. The findings not only improve our understanding of the link between sequence and response for peptides but also highlight the potential of utilizing deep learning for developing methods allowing for absolute, label-free peptide quantification.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过深度学习解码邻近氨基酸对 ESI-MS 强度输出的影响。
使用质谱(MS)进行肽级定量并非易事,因为理化特性会影响响应和可检测性。特定的氨基酸(AA)序列会影响这些特性,但人们对序列和强度输出之间的联系仍然知之甚少。在这项工作中,我们探索了氨基酸对的组合(即二聚体图案),以确定局部氨基酸环境与 MS1 强度之间的潜在关系。为此,我们建立了一个深度学习(DL)模型,该模型由带有注意机制的编码器-解码器组成。注意力机制可以识别最相关的主题。我们持续观察到了一些特定的模式,其中一个笨重/芳香和疏水 AA 之后是一个阳离子 AA,以及连续的笨重/芳香和疏水 AA 对于预测 MS1 强度非常重要。将注意力权重与平均 MS1 强度相关联发现,一些重要的主题(尤其是含有 Trp、His 和 Cys 的主题)与低响应肽相关联,而含有 Lys 和大多数笨重疏水 AA 的主题往往与高响应肽相关联。此外,Asn-Gly 与低响应相关。该模型预测 MS1 反应的平均百分比误差约为 11%,皮尔逊相关系数约为 0.68。虽然与早期工作中的单一 AA 表示相比,肽序列的二聚体表示并没有提高预测能力,但这项工作为更好地理解 MS 分析中的肽响应提供了宝贵的见解。意义:质谱本身并不是定量分析,化合物的响应不仅取决于其浓度,还取决于分子组成。对于基于质谱的肽段分析,如自下而上的蛋白质组学,这直接意味着响应不能直接用于定量单个肽段。此外,人们对响应与单个肽段的氨基酸序列之间的关系仍然知之甚少。在此,我们使用基于具有注意机制的递归神经网络的深度学习模型,通过分析等摩尔肽池中总共近 20 万个独特的肽段,研究了肽段中二聚体图案的存在如何影响 MS1 响应。我们不仅确定了某些二聚体类别和对响应有重大影响的特定二聚体,而且该模型还能在独立测试子集中以较低的误差率预测肽段强度。这些发现不仅提高了我们对肽序列与响应之间联系的理解,而且突出了利用深度学习开发绝对、无标记肽定量方法的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
期刊最新文献
A Systematic Review of Sleep Disturbance in Idiopathic Intracranial Hypertension. Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models. Anti-Myelin-Associated Glycoprotein Neuropathy: Recent Developments. Approach to Managing the Initial Presentation of Multiple Sclerosis: A Worldwide Practice Survey. Association Between LACE+ Index Risk Category and 90-Day Mortality After Stroke.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1