利用支持向量机和二肽组成预测控制玉米体内母系特异性单倍体诱导的类拍蛋白

IF 3 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY Amino Acids Pub Date : 2024-03-09 DOI:10.1007/s00726-023-03368-0
Suman Dutta, Rajkumar U. Zunjare, Anirban Sil, Dwijesh Chandra Mishra, Alka Arora, Nisrita Gain, Gulab Chand, Rashmi Chhabra, Vignesh Muthusamy, Firoz Hossain
{"title":"利用支持向量机和二肽组成预测控制玉米体内母系特异性单倍体诱导的类拍蛋白","authors":"Suman Dutta,&nbsp;Rajkumar U. Zunjare,&nbsp;Anirban Sil,&nbsp;Dwijesh Chandra Mishra,&nbsp;Alka Arora,&nbsp;Nisrita Gain,&nbsp;Gulab Chand,&nbsp;Rashmi Chhabra,&nbsp;Vignesh Muthusamy,&nbsp;Firoz Hossain","doi":"10.1007/s00726-023-03368-0","DOIUrl":null,"url":null,"abstract":"<div><p>The mutant <i>matrilineal</i> (<i>mtl</i>) gene encoding patatin-like phospholipase activity is involved in <i>in-vivo</i> maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6–7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for <i>in-vivo</i> haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of <i>Zea mays</i> were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained &gt; 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in <i>in-vivo</i> haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.</p></div>","PeriodicalId":7810,"journal":{"name":"Amino Acids","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00726-023-03368-0.pdf","citationCount":"0","resultStr":"{\"title\":\"Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition\",\"authors\":\"Suman Dutta,&nbsp;Rajkumar U. Zunjare,&nbsp;Anirban Sil,&nbsp;Dwijesh Chandra Mishra,&nbsp;Alka Arora,&nbsp;Nisrita Gain,&nbsp;Gulab Chand,&nbsp;Rashmi Chhabra,&nbsp;Vignesh Muthusamy,&nbsp;Firoz Hossain\",\"doi\":\"10.1007/s00726-023-03368-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The mutant <i>matrilineal</i> (<i>mtl</i>) gene encoding patatin-like phospholipase activity is involved in <i>in-vivo</i> maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6–7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for <i>in-vivo</i> haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of <i>Zea mays</i> were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained &gt; 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in <i>in-vivo</i> haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.</p></div>\",\"PeriodicalId\":7810,\"journal\":{\"name\":\"Amino Acids\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s00726-023-03368-0.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Amino Acids\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00726-023-03368-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Amino Acids","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s00726-023-03368-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

编码类磷脂酶活性的突变母系(mtl)基因参与了玉米体内母系单倍体诱导。通过秋水仙素处理使单倍体染色体加倍,只需一代就能使近交系完全固定下来,而自交则需要 6-7 代。因此,了解其他作物中的类拍蛋白对体内单倍体诱导具有重要意义。迄今为止,还没有一种在线工具能将未知蛋白归类为类蛋白。在此,我们旨在优化一种基于机器学习的算法,以预测未知蛋白质的类拍蛋白磷脂酶活性。我们使用了四种不同的核[径向基函数(RBF)、sigmoid、多项式和线性]来构建支持向量机(SVM)分类器,并使用了六种不同的基于序列的组成特征(AAC、DPC、GDPC、CTDC、CTDT 和 GAAC)。共分析了 1170 条蛋白质序列,包括来自不同单子叶植物、双子叶植物和微生物的类棒蛋白(585 条序列),以及来自玉米不同亚种的非类棒蛋白(585 条序列)。RBF 和多项式核在预测类蛋白方面很有前景。在六个基于序列的组成特征中,使用 RBF 和多项式核对二肽组成的预测准确率大于 90%。利用互信息,确定了对预测过程贡献最大的二肽。这项研究产生的知识可在任何实验开始前用于其他作物。所开发的 SVM 模型为从事经济作物体内单倍体诱导的科学家开辟了一个新的范例。这是第一份通过机器学习识别具有类拍蛋白活性的蛋白质的报告。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition

The mutant matrilineal (mtl) gene encoding patatin-like phospholipase activity is involved in in-vivo maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6–7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for in-vivo haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of Zea mays were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained > 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in in-vivo haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Amino Acids
Amino Acids 生物-生化与分子生物学
CiteScore
6.40
自引率
5.70%
发文量
99
审稿时长
2.2 months
期刊介绍: Amino Acids publishes contributions from all fields of amino acid and protein research: analysis, separation, synthesis, biosynthesis, cross linking amino acids, racemization/enantiomers, modification of amino acids as phosphorylation, methylation, acetylation, glycosylation and nonenzymatic glycosylation, new roles for amino acids in physiology and pathophysiology, biology, amino acid analogues and derivatives, polyamines, radiated amino acids, peptides, stable isotopes and isotopes of amino acids. Applications in medicine, food chemistry, nutrition, gastroenterology, nephrology, neurochemistry, pharmacology, excitatory amino acids are just some of the topics covered. Fields of interest include: Biochemistry, food chemistry, nutrition, neurology, psychiatry, pharmacology, nephrology, gastroenterology, microbiology
期刊最新文献
The reverse transsulfuration pathway affects the colonic microbiota and contributes to colitis in mice Structure-activity relationship of amino acid analogs to probe the binding pocket of sodium-coupled neutral amino acid transporter SNAT2 LLM4THP: a computing tool to identify tumor homing peptides by molecular and sequence representation of large language model based on two-layer ensemble model strategy Kinetic analysis of D-Alanine upon oral intake in humans A systematic review and meta-analysis of clinical trials on the effects of glutamine supplementation on gut permeability in adults
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1