A method for constructing interpretable hidden Markov models for the task of identifying binding cores in sequences

D.A. Kleverov, A.A. Shalyto, M.N. Artyomov
{"title":"A method for constructing interpretable hidden Markov models for the task of identifying binding cores in sequences","authors":"D.A. Kleverov, A.A. Shalyto, M.N. Artyomov","doi":"10.17586/2226-1494-2023-23-5-989-1000","DOIUrl":null,"url":null,"abstract":"Solving the problem of predicting the immune response against foreign protein sequence fragments processed by cells is one of the major milestones on the road to the personalized cancer vaccine development. The selection of peptides participating in the immune response is a complex multi-stage process of filtering initial sequences to present their fragments on the cell surface. The most studied task regarding this filtering nowadays is the prediction of the binding probability of peptides to major histocompatibility complex molecules. Modern methods for predicting this stage are usually based on algorithms using artificial neural networks, which make it impossible to interpret the result predictions of such models. One of the methods to overcome this limitation is the use of interpretable hidden Markov models. In this work, an analysis of the binding prediction task is performed. As a result, a method for constructing interpretable models that consider domain-specific constraints and requirements is proposed. A method for the constriction, training and interpretation of hidden Markov models was proposed for each class of molecules. The construction and training are based on maintaining the model architecture capable of extracting and visualizing the binding core of the peptide. Interpretation is possible through the analysis of the model graph. The proposed method is tested in the task of training a model that not only enables prediction but also facilitates determining the position of the peptide binding core and the distribution of amino acids within the core. Prediction models were trained for two types of molecules using binding data. The distributions of amino acids in the binding core match the state distributions of the model. Sequence patterns of such regions extracted using the trained models for two sets of peptide data correspond to patterns from public databases, confirming the successful validation of the method. Interpretable models provide a better description of the problem domain and help to draw a conclusion about peptide characteristics based on information extracted from the model. This information will allow researchers to better understand other steps of peptide processing involved in the immune response. For example, one can study relationships between these steps or perform a transfer of knowledge from models trained for one step to others. Using this knowledge will allow the training of the models under conditions of limited training data.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17586/2226-1494-2023-23-5-989-1000","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

Abstract

Solving the problem of predicting the immune response against foreign protein sequence fragments processed by cells is one of the major milestones on the road to the personalized cancer vaccine development. The selection of peptides participating in the immune response is a complex multi-stage process of filtering initial sequences to present their fragments on the cell surface. The most studied task regarding this filtering nowadays is the prediction of the binding probability of peptides to major histocompatibility complex molecules. Modern methods for predicting this stage are usually based on algorithms using artificial neural networks, which make it impossible to interpret the result predictions of such models. One of the methods to overcome this limitation is the use of interpretable hidden Markov models. In this work, an analysis of the binding prediction task is performed. As a result, a method for constructing interpretable models that consider domain-specific constraints and requirements is proposed. A method for the constriction, training and interpretation of hidden Markov models was proposed for each class of molecules. The construction and training are based on maintaining the model architecture capable of extracting and visualizing the binding core of the peptide. Interpretation is possible through the analysis of the model graph. The proposed method is tested in the task of training a model that not only enables prediction but also facilitates determining the position of the peptide binding core and the distribution of amino acids within the core. Prediction models were trained for two types of molecules using binding data. The distributions of amino acids in the binding core match the state distributions of the model. Sequence patterns of such regions extracted using the trained models for two sets of peptide data correspond to patterns from public databases, confirming the successful validation of the method. Interpretable models provide a better description of the problem domain and help to draw a conclusion about peptide characteristics based on information extracted from the model. This information will allow researchers to better understand other steps of peptide processing involved in the immune response. For example, one can study relationships between these steps or perform a transfer of knowledge from models trained for one step to others. Using this knowledge will allow the training of the models under conditions of limited training data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种构造可解释隐马尔可夫模型的方法,用于识别序列中的绑定核
解决细胞对外源蛋白序列片段的免疫反应预测问题是个体化癌症疫苗开发道路上的重要里程碑之一。参与免疫应答的肽的选择是一个复杂的多阶段过程,需要过滤初始序列以将其片段呈现在细胞表面。目前关于这种过滤的研究最多的任务是预测肽与主要组织相容性复合体分子的结合概率。预测这一阶段的现代方法通常基于使用人工神经网络的算法,这使得无法解释此类模型的结果预测。克服这一限制的方法之一是使用可解释的隐马尔可夫模型。在这项工作中,对绑定预测任务进行了分析。因此,提出了一种构造考虑领域特定约束和需求的可解释模型的方法。针对每一类分子,提出了隐马尔可夫模型的压缩、训练和解释方法。构建和训练是基于维持能够提取和可视化肽结合核心的模型架构。通过对模型图的分析,可以进行解释。在训练模型的任务中测试了所提出的方法,该模型不仅能够预测,而且有助于确定肽结合核心的位置和核心内氨基酸的分布。利用结合数据对两类分子的预测模型进行了训练。结合核中氨基酸的分布符合模型的状态分布。利用训练好的模型对两组肽数据提取的这些区域的序列模式与公共数据库中的模式相对应,证实了该方法的成功验证。可解释模型可以更好地描述问题域,并有助于根据从模型中提取的信息得出关于肽特征的结论。这一信息将使研究人员更好地了解免疫反应中涉及的肽加工的其他步骤。例如,可以研究这些步骤之间的关系,或者将为一个步骤训练的模型的知识转移到其他步骤。使用这些知识将允许在有限的训练数据条件下训练模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
0.70
自引率
0.00%
发文量
102
审稿时长
8 weeks
期刊最新文献
Homograph recognition algorithm based on Euclidean metric Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features Structural and spectral properties of YAG:Nd, YAG:Ce and YAG:Yb nanocrystalline powders synthesized via modified Pechini method Laser-induced thermal effect on the electrical characteristics of photosensitive PbSe films An improved performance of RetinaNet model for hand-gun detection in custom dataset and real time surveillance video
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1